High Performance Git

Section IV ยท Large-Repo Operations, Transport, and Scale

Chapter 17

Large Ref Sets: Files, Packed-Refs, Reftable, and git refs

Pencil sketch of a harbor at sunset with a lighthouse beyond docks and small boats.

Refs are usually so cheap that they fade into the background. A massive repository, or one with some kind of wayward automation, may accumulate enough that the ref layer becomes its own storage and lookup problem. Refs grow under:


Large Ref Counts, packed-refs, and the Older Backend Story

Ref scale issues can show up in fetch, ls-remote, show-ref, for-each-ref, and branch and tag views in tools, because large ref counts affect ordinary network and local inspection paths. Git's oldest ref model is straightforward: many refs exist as files under .git/refs, where each ref is a name and the file stores the object ID the name currently points to. You can see this directly:

find .git/refs -type f | head
cat .git/refs/heads/main

The first command lists every loose ref file, and the second prints the object ID a branch ref points at as one line of 40 hex characters and a newline. For small and medium repositories this remains perfectly workable: correctness is fine, and the problem at scale is layout, because many tiny ref files are not a good shape for large enumeration or dense repository metadata.

The older large-ref answer follows a familiar Git pattern (packs): packed-refs. Instead of keeping every ref as its own loose file, Git can store many refs together in a more compact flat file, which makes large read-heavy ref sets more manageable, though still only a partial answer. Loose refs and packed-refs together create a hybrid model where some refs stay loose, many refs may be packed, and updates and enumeration have to account for both. The hybrid works, but it shows its age once repositories have very large ref populations or need better compaction and lookup properties.

git pack-refs behaves differently under different ref backends, so ref compaction is no longer one uniform story. For the traditional files backend, pack-refs --auto uses a heuristic around loose refs and the current packed-refs size; for reftable, auto-compaction works in terms of geometric table relationships.

You can watch the older compaction path directly:

git pack-refs --all
sed -n '1,20p' "$(git rev-parse --git-path packed-refs)"

That turns many refs into one flat file and lets you inspect the result immediately. Reftable changes the backend, but many repositories still use the older packed-refs layout.

Reftable Is the Newer Backend Direction

Reftable
Newer Git ref-storage backend that stores refs and reflogs in sorted tables rather than loose files plus packed-refs.

Reftable gives refs a real storage backend. It applies a more structured storage model: sorted tables, compaction, efficient lookup, and reflog storage alongside refs.

git init now includes --ref-format=<format> and names reftable as an option alongside the traditional files format. Reftable is also still marked experimental.

Check it out:

git init --ref-format=reftable reftable-demo
cd reftable-demo
git config user.name Example
git config user.email example@example.com
git commit --allow-empty -m 'one'
find .git/reftable -maxdepth 1 | sort
git refs verify --strict

The reftable/ directory makes the backend choice concrete, and git refs verify --strict confirms Git can read the ref database coherently. Reftable aims at several problems at once:

I would run a production repo on reftable only when packed-refs pressure is clearly happening: ref counts in the tens of thousands, slow ls-remote, or friction during compaction. For a few hundred branches and a few thousand tags, files plus packed-refs is just fine.

git refs Makes Backend Operations Explicit

git refs is a newer command in this part of Git. If git refs -h fails on your machine, stop there and treat the rest of this section as newer-build territory. It collects backend operations in one place instead of scattering them across lower-level plumbing. The command family is still fairly small: the backend-oriented subcommands that matter most here are migrate and verify, alongside list, exists, and optimize.

git refs puts a few practical questions in one place:

Two limits matter immediately: git refs migrate does not work in repositories that have worktrees, and Git cannot block concurrent writes while the migration is running. If scheduled maintenance is active, unregister it first, and treat ref-backend migration like storage maintenance rather than a harmless foreground convenience command. A safe rehearsal should start with capability and concurrency checks:

git --version
git refs -h
git worktree list
git maintenance unregister --force                # if background maintenance was configured (--force added in recent Git)
git refs verify --strict
git refs migrate --ref-format=reftable --dry-run

If git worktree list shows linked worktrees beyond the current checkout, stop there. If the repository was registered for scheduled maintenance, unregister it before the dry run. git refs verify --strict and git refs migrate --ref-format=reftable --dry-run are the minimum safe rehearsal before a real backend change. Inspect the older and newer forms directly:

find .git/refs -type f | sed -n '1,20p'
git for-each-ref --count=10 --format='%(refname)'
test -f "$(git rev-parse --git-path packed-refs)" && sed -n '1,10p' "$(git rev-parse --git-path packed-refs)"

The loose-ref listing shows the old file-per-ref backend directly. for-each-ref shows the advertised name set. packed-refs shows whether refs have been compacted into the older flat-file backend.