High Performance Git

Section I ยท Foundations

Chapter 3

Refs, HEAD, Reflogs, Index

Pencil sketch of a harbor office desk with maps, tools, and a view of workers loading crates outside.

Blob, tree, and commit describe stored repository state. Refs, HEAD, reflogs, and the index describe how Git is navigating that state right now.

Those are the pieces behind everyday questions: what does main mean, what is checked out, what will the next commit contain, and where did this branch point before I moved it?

switch, commit, reset, and recovery commands keep coming back to the same structures, so once these are visible, a lot of command behavior reads more plainly.


Refs Are the Names You Actually Work With

Ref
Durable Git name that stores an object ID, usually the ID of a commit.

Most of the time, you are not working with raw object IDs; you are using names like main, feature-x, or origin/main. Behind the scenes, those names resolve to refs, and the refs resolve to object IDs.

That indirection is useful, but it can blur what a branch really is. A branch is a movable name pointing at one commit. The rest of the history is whatever you can reach by following parent links backward.

So branch creation is cheap (move a pointer). Branch changing is cheap (move a pointer). But something like force-pushing is a big deal: the underlying objects may still exist, but the names you rely on have moved somewhere else.

HEAD Is Your Current Attachment Point

HEAD
Special ref that usually names the current branch, or directly names a commit in detached state.

HEAD is the name that answers "where am I?"

In the usual case, HEAD points to a branch ref such as refs/heads/main. In that state, new commits update the branch, and HEAD stays attached to it. You experience that as "I am on main."

In detached HEAD state, HEAD points directly to a commit ID instead. You can still inspect history, run builds, test old revisions, or even make new commits; the difference is that those new commits are not anchored by a branch name unless you create one.

Operationally, the distinction matters. Detached HEAD is easier to work with if you think of it as being "attached to a commit rather than a branch" instead of just some dreaded, mysterious failure state.

Symbolic Refs and Direct Refs

This difference between attached and detached states exists because not all refs behave the same way. Some refs directly store an object ID, while others symbolically point to another ref.

HEAD is usually symbolic, which is why it can say "the current branch is main" without storing the commit ID itself. Branch refs, by contrast, usually store object IDs directly. That small distinction is enough to explain a lot of Git's day-to-day behavior, especially when you switch branches, check out specific commits, or inspect repository state with plumbing commands.

It also explains why some Git operations feel like they move "you," while others feel like they move "the branch." In reality, both may be happening, but through different layers of naming.

The Index Is Not Just a Staging Shelf

Index
Serialized staging and path-state structure that defines the next snapshot.

The index is a little mystical, to the extent it's considered at all. You may think of it as a place for staging files before committing them. Or the index is often taught in a very physical metaphor: you put changes "into" it, and then commit picks them up. That metaphor is helpful at first, but it hides the more important fact that the index is a structured representation of the next snapshot.

In the normal case, the "index" is literally one binary file, usually .git/index. That file stores structured path entries, cached metadata, and stage information. Git knows what to do with this when you run commit.

First, though, let's step back a little bit. When you run git add, you are not tossing files onto a pile. You are updating index entries so that the index reflects the content, mode, and path state you want the next commit to record. When you run git commit, Git reads the index, writes whatever blobs and trees are needed to represent that snapshot, and then writes a commit object that points to the resulting root tree.

Literally, Git reads that file into memory at the start of many commands, updates its entries, and writes it back out when needed.

Each ordinary index entry answers a specific question: if this path were part of the next commit, what blob would it point to, what mode would it have, and what does Git currently believe about the matching working-tree file? In practice that means an entry typically contains:

That last field matters during merges. In a conflict, Git can keep multiple index entries for the same path at different stages: stage 1 for the merge base, stage 2 for "ours," and stage 3 for "theirs." The index is more than a staging checklist. It is a structured data model for the next snapshot, including unresolved states that cannot yet be written as a normal commit.

It also helps to be explicit about what the index does not contain. It does not store the full file bytes inline; it stores blob IDs that point to object content elsewhere in the repository. It also does not have to match either HEAD or the working tree exactly. That three-way separation is the source of many ordinary Git states:

Once those three layers are distinct, common situations become easier to parse. A file can be unchanged in HEAD, staged differently in the index, and edited again in the working tree all at the same time. That is part of the normal model.

If you want to see the index more literally, git ls-files --stage is a good starting point. It prints paths together with modes, blob IDs, and stage numbers, which is often the fastest way to replace the vague idea of "the staging area" with the actual shape of the data structure.

A Few Commands Make the Control Plane Visible

These pieces get much easier to read once you ask Git to print them directly:

git symbolic-ref -q HEAD || git rev-parse HEAD
git rev-parse HEAD main
git ls-files --stage
git reflog --date=iso

The first command prints the attached branch name when HEAD is symbolic and falls back to the current commit when it is detached. rev-parse resolves names to object IDs. ls-files --stage shows the index entries Git is working from. reflog shows recent name movement.

Once you see it that way, a lot of familiar Git behavior becomes easier to explain. A commit records staged changes rather than every local edit because the index, not the working tree, defines the next snapshot. git diff without arguments compares working tree to index, while git diff --cached compares index to HEAD. Each mode is a different comparison across the same small set of layers.

Why the Index Matters for Performance

The index matters for correctness, and it matters just as much for speed. Git uses it as a cache of path metadata so that commands do not have to rediscover the state of every tracked file from scratch each time.

In large repositories, the index becomes one of the main characters. The same file that defines the next snapshot also doubles as a cache over the working tree, so when git status is slow, Git is usually doing too much work reconciling index entries with the filesystem. Untracked-file discovery, stat comparisons, ignore evaluation, fsmonitor integration, split-index behavior, and sparse-index behavior all sit close to this part of Git.

The index is where staging happens, but it is also where a great deal of Git's local responsiveness is either preserved or lost.

Reflogs Remember Movement

Reflog
Local record of how a ref moved over time on your machine.

Refs move, so reflogs matter. A reflog records where a ref pointed on your machine, which means it also records a local history of resets, rebases, recoveries, experiments, and mistakes.

Git remains recoverable even when names move in ways you did not intend. Commands such as reset --hard or an ill-timed rebase can certainly move names in surprising ways, but the old objects do not necessarily disappear at once, and the reflog often tells you exactly where those names used to be. In practice, "I lost a commit" so often turns into "I can get it back from the reflog."

There are limits. Reflogs are local, not shared, and they expire eventually. Even so, they show Git's bias toward recoverability: although names are allowed to move freely, Git keeps a local memory of that movement.

You can watch that memory grow in a throwaway repo:

tmp=$(mktemp -d)
cd "$tmp"

git init -b main
git config user.name Example
git config user.email example@example.com

printf 'one\n' > note.txt
git add note.txt
git commit -m 'one'

printf 'two\n' > note.txt
git commit -am 'two'

git reflog --date=iso
sed -n '1,5p' .git/logs/HEAD

git reset --hard HEAD^

git reflog --date=iso
sed -n '1,8p' .git/logs/HEAD

The reflog is not a separate history graph. It is a local log of ref movement. The reset --hard appends another entry; it does not erase the earlier one. The old commit is often still easy to find even after the branch name has moved away from it.

Reading switch, commit, and reset More Literally

These structures become especially useful once you start reading common commands as combinations of a few simpler actions.

When you run git switch feature-x, Git resolves the target ref, updates HEAD so that it names that branch, rewrites the index to match the target snapshot, and then updates the working tree if that can be done safely.

When you run git commit, Git reads the index, writes new objects as needed, advances the current branch ref, and appends reflog entries that remember the move.

When you run git reset, the behavior depends on mode, but the core idea is always the same: some combination of ref movement, index rewrite, and working-tree update. That is also why reset feels like several commands under one name; in a sense, it is.

Once you reduce those operations to "move a name," "rewrite the index," and "refresh the working tree," they stop feeling magical and start feeling compositional.

Why Recovery So Often Starts with Names

When users say they "lost work" in Git, they often mean one of two things. Either the underlying objects were never written, or the objects still exist but the names that used to point to them no longer do.

More often, the commit is still there and the easy name for it is gone. Remember again what a commit literally is, and you'll understand why the data is likely still available. A rebase, reset, amend, checkout, or branch deletion can make a commit feel lost even when it is still present in the object database. In those cases, recovery usually starts with refs and reflogs, not low-level object forensics.

Understanding the control plane pays off immediately. If you know that branches are names, that HEAD is a special name, that reflogs remember name movement, and that the index is separate from both history and the working tree, a large class of Git mishaps becomes much easier to unwind.

The Control Plane and the Data Plane

Keep one big distinction clear:

The two layers constantly interact, but they are not the same thing. A commit may stay untouched while a branch ref moves. The index may change even though no commit has been written yet. A reflog may remember a previous location even though the current branch points somewhere else entirely.

That separation is one reason Git can feel confusing at first, but it is also one reason Git is powerful. Once you see the control plane and the data plane separately, many commands stop looking like one giant operation and start looking like a few smaller operations stacked together.

Why This Chapter Matters for Speed

This matters for performance too. Commands that touch refs are usually cheap. Commands that only move HEAD are usually cheaper than commands that also rewrite the index and working tree. Commands that force a full working-tree refresh are more expensive than commands that only walk history. Recovery is often fast when the reflog is enough, and slower when you have to drop into object-level inspection.