Chapter 7: The Index as a Performance Structure

Pencil sketch of runners circling a school track with a coach and a few spectators nearby.

The index has been treated as a staging area so far. It is also one of Git's most important local performance structures: Git uses it to cache path state, validate the working tree, and avoid rediscovering the repository on every command. Many local performance problems show up here.

`git status` Is Not Just Looking at Files

Start with a simple split: git status is doing more than "looking at the working tree and telling you what changed." It compares several layers:

HEAD, which records the current committed snapshot
the index, which records the next snapshot plus cached path metadata
the working tree, which records what is on disk now

So status can be expensive even though it writes no history. The command validates a cached model of the checkout against a mutable filesystem. On a 100k-file checkout, that usually means a great many lstat() calls, plus directory scans for paths the index has never seen. Five seconds of this feels especially rude because nothing showed up.

You can expose that with a very small set of commands:

git status --untracked-files=all
git ls-files --debug | sed -n '1,20p'
git update-index --refresh

status asks the high-level question. ls-files --debug shows the cached path state Git is leaning on. update-index --refresh forces a pass that revalidates index entries against the working tree.

Index Entry: Per-path record in the index containing staged object identity plus cached filesystem state and flags.

The index is a sorted table of path entries. Each entry says, in effect:

what blob ID the next commit would use for this path
what mode that path has
what Git last observed about the corresponding working-tree file

Git does not want to re-hash every tracked file every time you ask a local question. Instead, it uses the index to remember enough filesystem state that many paths can be treated as unchanged unless the cached facts no longer line up with what the filesystem reports.

That is the gap between a content-addressed history model and a usable local workflow. The history model says what the snapshot means. The index helps Git decide how much of the working tree it actually has to touch right now.

Commits already point to trees, so it is fair to ask why Git also needs the index. Git needs it for staging and locality. The current commit tells Git what the last committed snapshot looked like. It does not tell Git which files in the working tree might have changed since then without further investigation. The index fills that gap. It sits between immutable history and mutable disk state.

That makes the index central to so many everyday commands:

status
add
diff
commit
switch
reset
checkout

The commands differ, but they all benefit from Git having one structured place where path state is already organized and partially cached.

Tracked Files, Untracked Files, and Why Stat Data Matters

For tracked files, Git often begins with the index and asks whether the filesystem still agrees with what it cached for that path. For untracked files, Git has a different problem: it has to discover paths that are not in the index at all. Those are not the same job, and they fail differently at scale. A repository can have a manageable set of tracked-file validations and still feel slow because untracked-file discovery is expensive, or the opposite can happen. Large-repository tuning gets easier once you separate:

tracked-file validation
untracked-file discovery
ignore-rule evaluation

All three sit close to the index, but they are not identical workloads. Different accelerators help different parts of the problem.

Git compares cached filesystem facts from the index against current filesystem facts to decide whether a path probably needs deeper inspection. Opening every tracked file and hashing its content on every status would be too expensive on a large checkout, so the cache does most of the work.

Exactly which metadata Git trusts can vary with platform and configuration, but the principle is stable: the index holds enough cached path state that Git can often skip content reads for files that appear unchanged.

That is also why some environments feel worse than others. Network filesystems, antivirus or endpoint-security agents, editors that do atomic-save renames, background indexers, and build flows that restamp files can all make the stat cache less trustworthy than it first appears, and the issue is usually that Git is trying to cheaply validate a cached model against a noisy real filesystem. If the index is a good cache and the filesystem gives Git trustworthy enough signals, many local commands remain cheap because Git can avoid deep inspection of most paths. If the cache cannot be trusted or the working tree is too large to scan comfortably, local Git gets slower because Git has to do more direct filesystem work:

more lstat() calls
more directory scans
more ignore matching
more content checks for paths that look suspicious

Local Git tuning is mostly about reducing that filesystem uncertainty.

Untracked Cache and Fsmonitor

Untracked Cache: Index extension that caches directory mtimes so unchanged directories are skipped when looking for untracked files.

The untracked cache targets the part of status that has to discover files Git does not already know about. If a directory has not changed since the last scan, Git may be able to avoid reading through it again while searching for untracked files. The feature depends on whether the filesystem updates directory mtimes in the way Git expects.

git update-index --test-untracked-cache
git config --get core.untrackedCache
git status

If the probe fails, the cache will not do much for you.

Fsmonitor: Integration that lets Git learn which paths changed from a filesystem monitor instead of probing every tracked path itself.

The fsmonitor feature attacks a different part of the same problem. Instead of asking the filesystem about every tracked path, Git can consult a monitor that reports which paths changed. The index stays in place; fsmonitor makes it more effective by narrowing the set of paths Git needs to revalidate aggressively.

Fsmonitor turns a brute-force local scan into a selective query. In a large working tree, that changes the feel of commands dramatically — Git checks a short list instead of everything.

When combined with the untracked cache, fsmonitor helps on both sides of the local problem:

fewer expensive tracked-path probes
fewer unnecessary directory scans for untracked files

The corresponding visibility pass is just as small:

git config --get core.fsmonitor
git status

The diagnostic question is which local cost dominates: untracked discovery, tracked-path probes, or full-index rewrites. The Configuration Playbook chapter covers the specific settings (core.fsmonitor, core.splitIndex, feature.manyFiles, and others) with version caveats and compatibility guidance. You do not have to treat the index as one opaque binary blob, either:

git ls-files --stage --debug | sed -n '1,20p'
git update-index --test-untracked-cache
git config --show-origin --get-regexp '^(core\.fsmonitor|index\.)' || true

ls-files --debug shows the cached path metadata Git is leaning on. --test-untracked-cache checks whether the filesystem can support that accelerator cleanly. The config query shows whether fsmonitor, split-index, sparse-index, and related index settings are enabled.

Split Index and Sparse Index Reduce Different Costs

Split Index: Mode that stores a stable shared index plus a smaller mutable overlay to reduce repeated full-index rewrites.

The index is valuable as a cache, but it also has a cost: Git has to read and write it. In very large repositories, repeatedly rewriting the full index becomes noticeable.

Split index addresses that problem by separating a mostly stable shared index from a smaller mutable layer where recent changes accumulate. Instead of rewriting one huge index file every time, Git can often update a smaller overlay and periodically push changes back into a shared index file.

That shows up directly on disk:

git update-index --split-index
ls .git/index*

The core index still exists as the main path-state structure, but Git keeps adding extensions around it to make local operations more selective and less repetitive. Untracked cache, fsmonitor data, split index, sparse index, end-of-entry tables, and related extensions all serve the same broad purpose: keep the basic model intact while reducing unnecessary work. The index is a subsystem with its own scaling story, well beyond the staging-area mental model.

The sparse index belongs in a later chapter because it depends on sparse-checkout, but it shows up here too. In a sparse working tree, Git does not always need a fully expanded per-file index for every path in the repository. Under the right conditions, entire directory regions can be summarized more compactly. That changes both memory behavior and command cost.

Once you see the index as a performance structure, sparse index looks like the natural next step. Git is avoiding work it does not currently need to represent.

Slow `status` Usually Means One of a Few Things

When git status is slow, the problem is often one or more of these:

too many paths to validate
too many directories to scan for untracked files
poor filesystem signals for cache validation
missing or disabled index-side accelerators
a working tree that should probably be reduced with sparse techniques

Before I reach for any config, I ask which of those layers is actually expensive. "Git is slow on this repo" is not a diagnosis; "untracked discovery is eating two seconds" is.

When the evidence is still fuzzy, Trace2 is the next step:

GIT_TRACE2_PERF=/tmp/status.perf git status >/dev/null
less /tmp/status.perf

Hot refresh_index or lstat time points toward tracked-path validation. Hot read_directory time points toward untracked discovery.

The Index Is Where Staging and Performance Meet

The index serves two jobs at once:

semantic job: define the next snapshot
performance job: cache path state so local commands can avoid rediscovering everything

Those jobs are closely related but not identical. A feature can matter primarily because it affects staging semantics, primarily because it affects local scan cost, or both.

The index is where Git's snapshot model meets a large mutable filesystem. That is why it keeps showing up whenever local Git is fast or slow.

Even if most users never inspect .git/index directly, they feel its behavior constantly.

A fast status means the cache is doing its job.
A surprising diff --cached means the next snapshot differs from the working tree.
A clean branch switch means the index and working tree were rewritten successfully.
A painful local slowdown often means Git is spending too much time validating or rebuilding index-adjacent state.

The index is one of the most important local data structures Git has.

git status Is Not Just Looking at Files