Modern Git gets a lot of its speed from metadata it writes beside the repository it already has: commit-graph files, changed-path Bloom filters, multi-pack-indexes, and reachability bitmaps. Git is taking a familiar tack here: use some extra space, and do some extra work ahead of time, for performance improvements.
Git's default has been to look at data when it's needed, such as:
- commits and parent links for history
- trees and blobs for content
- packs and pack indexes for object storage
The problem is that raw object access is not always the cheapest way to answer large questions. Parsing commit objects one by one, scanning many pack indexes independently, or recomputing reachability sets from scratch can become expensive at scale. And there are some pathologically slow cases that can be especially bad. So Git builds acceleration structures around the same underlying repository.
Four of the most important ones are:
- commit-graph files
- changed-path Bloom filters
- multi-pack indexes
- reachability bitmaps
These Structures Accelerate Existing Data
These structures do not replace commits, packs, trees, or refs. They summarize or index information that is already present so Git can answer certain questions faster. If these mechanisms store or cache data, they can always be rebuilt from existing logical primitives.
When one of them is missing, Git should still work. It just has to do the slower version of the job.
The accelerator chapter gets much easier to follow once you connect each idea to a place on disk:
- commit-graph usually lives at
.git/objects/info/commit-graph - split commit-graph chains live under
.git/objects/info/commit-graphs/ - the multi-pack-index lives at
.git/objects/pack/multi-pack-index - pack bitmaps and related pack sidecar files live under
.git/objects/pack/
That means you can often answer "do I even have this feature here?" with ordinary shell inspection before you reach for deeper Git commands:
test -f .git/objects/info/commit-graph && echo commit-graph present
test -d .git/objects/info/commit-graphs && find .git/objects/info/commit-graphs -maxdepth 1 -type f | sed -n '1,20p'
test -f .git/objects/pack/multi-pack-index && echo midx present
find .git/objects/pack -maxdepth 1 \\( -name 'pack-*.idx' -o -name 'pack-*.pack' -o -name 'pack-*.bitmap' -o -name 'multi-pack-index*' \\) | sort
Commit-Graph Serializes Commit Metadata for Fast Walks
- Commit-Graph
- Serialized file that stores commit metadata and parent relationships in a traversal-friendly form.
The motivation for commit-graph is direct: walking commit history in very large repositories can be too slow when Git has to parse commit objects one by one from packfiles. The commit-graph file precomputes and serializes the graph structure so that ancestry questions become cheaper lookups instead of repeated object parsing.
At the object level, Git can learn everything it needs about a commit by reading and parsing the commit object itself. But when a repository has a very large history, doing that repeatedly becomes expensive. The commit-graph file exists to make that cheaper. It stores commit OIDs together with associated metadata including generation numbers, root tree OIDs, commit dates, parent positions, and optionally changed-path Bloom filters.
Many history operations do not need the full text of every commit object up front. They need fast access to the graph shape and a few key facts about each node. That makes the commit-graph especially useful for:
- ancestry walks
- merge-base style reasoning
- reachability checks
- ordered history traversals over large commit sets
On disk, the simplest form is one file at .git/objects/info/commit-graph. Repositories using split commit-graph chains instead keep a commit-graph-chain file and one or more graph-*.graph files under .git/objects/info/commit-graphs/.
The commit-graph stores parent references by position within the graph file, not by repeating full object IDs for every edge. Once the commits are laid out in one dense table, parent navigation becomes much cheaper than repeatedly inflating scattered commit objects from packs just to chase edges.
Git is turning a graph problem into a more cache-friendly local data problem here. That is a very Git move: keep the meaning, cheat on the lookup path.
The commit-graph also records generation data. Generation numbers help Git reason about commit order and prune impossible ancestry candidates more quickly. Commit-graph files improve commands that walk a lot of history for exactly that reason. Git is not guessing less. It is crossing useless candidates off the list sooner.
Modern Git also supports corrected commit-date style generation data, with generation version 2 as the default for writing and reading commit-graph files. Even inside one accelerator, Git keeps refining the quality of its precomputed metadata.
Changed-Path Bloom Filters Speed Up Path-Limited History
- Changed-Path Bloom Filter
- Probabilistic filter attached to commit-graph data that helps Git skip commits unlikely to affect a requested path.
Changed-path Bloom filters extend the commit-graph for a specific problem: path-limited history queries like git log -- path can still be expensive even with a commit-graph because Git may need to inspect trees at each commit to determine whether the path changed. Bloom filters let Git skip that tree inspection for commits where the path almost certainly did not change.
The commit-graph becomes even more valuable when it carries changed-path Bloom filters. git log -- path is a graph walk plus repeated path reasoning. A cheap negative test can save a great deal of tree inspection there.
These filters are specific. The commit-graph format stores a Bloom filter for the paths changed between a commit and its first parent, if requested, and --changed-paths can provide significant gains for git log -- <path> style queries.
In practice:
- they are path-query accelerators
- they sit on top of the commit graph
- they are especially useful for history limited by file or directory
Bloom filters are not miniature path histories. They are probabilistic membership tests. The property is asymmetrical:
- a negative result is strong enough for Git to skip extra work
- a positive result only means "maybe, check further"
They help without changing correctness. Git can use them to avoid some unnecessary inspections, but it still falls back to the real underlying data when needed. It is a very Git-like optimization: cheap to query, safe to ignore when absent, and able to narrow a much larger search. A negative is gold. A positive is just more work.
It is also important not to ask too much of them. Changed-path Bloom filters are based on paths changed against the first parent, not on a full semantic story of renames, copies, or every possible merge interpretation. So they help path-limited history, but they do not give Git psychic powers about renames or turn the query into a free operation. The hard parts of history simplification, rename heuristics, and merge-heavy path reasoning still exist.
Multi-Pack-Index Makes Many Packs Behave More Like One Index
- Multi-Pack-Index
- Repository-level index that lets Git look up objects across multiple packfiles without consulting each pack index independently.
In slide decks, a repository has one tidy pack. Real repositories are messier. They accumulate multiple packs for ordinary reasons: incremental fetches, maintenance strategy, kept packs, promisor packs, geometric repacking, and more.
Without an extra index, object lookup across many packs can become needlessly repetitive. Git may need to consider pack after pack and its separate .idx file just to answer basic lookup questions.
The multi-pack-index, or MIDX, exists to collapse that lookup work into one repository-level index. Git can write or verify that MIDX as its own maintained artifact. That modest description hides an important effect. MIDX lets Git keep multiple packs around without paying the full lookup overhead of treating them as unrelated islands.
The literal file is .git/objects/pack/multi-pack-index. It sits beside the .pack, .idx, .rev, .mtimes, and bitmap sidecar files for the packs it indexes.
MIDX matters most when keeping multiple packs is better than rewriting everything into one. One answer to "too many packs" is "rewrite everything into one giant pack." Sometimes that is right. Sometimes it is just an expensive way to feel organized.
Modern Git has better options. Git can:
- keep multiple packs
- write a MIDX over them
- repack selected packs
- expire packs no longer referenced by the MIDX
- maintain geometric pack structure instead of flattening the whole repository every time
Git can keep multiple packs healthy without flattening the whole repository into one perfect pack.
Bitmaps Accelerate Reachability Enumeration
- Reachability Bitmap
- Precomputed set representation that speeds up answering which objects are reachable from selected commits.
Bitmaps solve a different class of problem from commit-graph and Bloom filters. Commit-graph helps with commit traversal. Bloom filters help with path-limited history. Bitmaps help when Git needs to enumerate lots of reachable objects quickly, especially for pack generation and transport.
Git exposes this in two places: git rev-list supports --use-bitmap-index for bitmap-assisted traversal, and git repack supports --write-bitmap-index for writing reachability bitmaps during repack. This ends up being one of the strongest performance features in large repositories because clone, fetch, and some object-counting tasks are really reachability-enumeration problems in disguise.
The reason bitmaps feel different from the other structures in this chapter is that they sit so close to transfer. When a server is preparing to satisfy a fetch or clone, it often needs to answer a question like:
"Which objects are reachable from these tips, but not already known on the other side?"
That question gets huge very quickly in a large repository. Bitmaps help because they precompute large reachability sets instead of forcing Git to rediscover them object by object every time. So bitmap discussions show up in server performance, clone performance, and fetch negotiation conversations more often than in ordinary local-command tutorials.
Historically, bitmaps were often associated with a repository having one dominant pack. That is still an important case, and bitmap writing only makes sense when the bitmap can refer to all reachable objects.
But modern Git also supports multi-pack bitmaps. git multi-pack-index includes --bitmap on write, and git repack --write-bitmap-index notes that when multiple packs are produced while writing a MIDX, a multi-pack bitmap can be created. Git can preserve a more flexible pack layout while still getting bitmap-style acceleration.
These files also live under .git/objects/pack/. In a single-pack case you will usually see a pack-*.bitmap file next to the corresponding pack and index. In a multi-pack case, the bitmap data lives alongside the MIDX in the same directory.
Different Accelerators Help Different Commands
Git performance advice often goes bad when these structures get thrown into one vague "make Git faster" bucket.
They are not.
Roughly speaking:
- commit-graph helps commit walks and ancestry reasoning
- changed-path Bloom filters help path-limited history queries
- multi-pack-index helps object lookup across many packs
- bitmaps help reachability enumeration, especially for transfer
Each structure maps to a class of expensive work.
Two quick probes make that concrete:
git multi-pack-index verify
git rev-list --use-bitmap-index --count --all
The first confirms that Git can treat many packs through one higher-level index. The second asks for a large reachability count using bitmap assistance when available.
A Few Commands Make These Files Concrete
These structures stop feeling abstract once you ask Git to write or verify them directly:
ls .git/objects/info
ls .git/objects/pack
git commit-graph write --reachable --changed-paths
git commit-graph verify
git multi-pack-index write
git multi-pack-index verify
git rev-list --use-bitmap-index --count --all
Those commands are a good way to see the structures in the chapter become actual files and actual query behavior. ls shows where the files live, the write commands create or refresh them, and the verify commands confirm that Git can read them coherently.
These files age. Maintenance is what keeps them aligned with the current repository state. Git will usually use these accelerators automatically if they are present. Writing them is more mixed. Commit-graph data, MIDX files, and related metadata often come from maintenance or explicit writes, while bitmap writing is more common in server-oriented or repack-heavy setups than in an ordinary local clone.
If you want to force the structures into existence instead of waiting for maintenance:
git commit-graph write --reachable --changed-paths
git multi-pack-index write --bitmap
Nobody needs to handcraft these files by candlelight (although that sounds lovely). When they are present and well maintained, Git can answer large questions with much less repeated work.
The pattern across the chapter is the same. Git keeps the core repository model simple:
- immutable objects
- refs
- parent links
- packs
Then it layers accelerators around that core:
- serialized commit metadata
- probabilistic path hints
- unified indexes across multiple packs
- precomputed reachability sets
Those structures do not make Git less Git-like. They make the same model usable at larger scales and lower latencies. They are some of the clearest places where Git turns repeated expensive computation into reusable metadata.