High Performance Git

Section IV ยท Large-Repo Operations, Transport, and Scale

Chapter 17

Large Ref Sets: Files, Packed-Refs, Reftable, and git refs

Pencil sketch of a harbor at sunset with a lighthouse beyond docks and small boats.

Refs are, by design, usually so cheap that they fade into the background. But they can stop fading into the background once there are enough of them to become their own storage and lookup problem.

Refs grow under:


Large Ref Counts, packed-refs, and the Older Backend Story

Ref scale issues can show up in:

Large ref counts can affect ordinary network and local inspection paths.

Git's oldest ref model is straightforward: many refs exist as files under .git/refs.

The model is easy to understand and easy to update incrementally:

For small and medium repositories, this remains perfectly workable.

The scaling issue is not correctness. It is that many tiny ref files are not always the best layout for large enumeration or dense repository metadata.

The older large-ref answer follows a familiar Git pattern (packs): packed-refs.

Instead of keeping every ref as its own loose file, Git can store many refs together in a more compact flat file. This makes large read-heavy ref sets more manageable.

Useful, but it is still a partial answer.

Loose refs and packed-refs together create a hybrid model:

This works. It also shows its age once repositories have very large ref populations or need better compaction and lookup properties.

git pack-refs behaves differently under different ref backends.

It shows Git already has backend-aware behavior here. Ref compaction is no longer one uniform story.

For the traditional files backend, pack-refs --auto uses a heuristic around loose refs and the current packed-refs size. For reftable, auto-compaction works in terms of geometric table relationships.

You can watch the older compaction path directly:

git pack-refs --all
sed -n '1,20p' "$(git rev-parse --git-path packed-refs)"

That turns many refs into one flat file and lets you inspect the result immediately. Reftable changes the backend, but the older packed-refs story is still worth touching directly once.

The ref layer is becoming more explicitly engineered.

Reftable Is the Newer Backend Direction

Reftable
Newer Git ref-storage backend that stores refs and reflogs in sorted tables rather than loose files plus packed-refs.

Reftable applies a more structured storage model to refs: sorted tables, compaction, efficient lookup, and reflog storage alongside refs. It gives the ref layer a backend more like Git's other modern data structures.

git init now includes --ref-format=<format> and names reftable as an option alongside the traditional files format. Reftable is also still marked experimental.

You can make that backend visible directly:

git init --ref-format=reftable reftable-demo
cd reftable-demo
git config user.name Example
git config user.email example@example.com
git commit --allow-empty -m 'one'
find .git/reftable -maxdepth 1 | sort
git refs verify --strict

The reftable/ directory makes the backend choice concrete, and git refs verify --strict confirms Git can read the ref database coherently.

Reftable matters because it aims at a cleaner answer to several problems at once:

The ref model stays the same. Refs are still names for object IDs. The storage backend changes.

Reftable is still marked experimental.

The question is usually not "why is everyone not using this already?" It is whether the current files plus packed-refs backend has become painful enough that testing a new backend is worth the effort.

Git now has better commands for verification, migration, and optimization, but this is still a part of the system where evidence matters more than enthusiasm.

At small scale, loose refs plus packed-refs usually work fine.

At larger scale, several pressure points appear:

Git has a habit of leaving the model alone and replacing the storage layer when a scaling problem becomes real.

Reftable fits that pattern.

git refs Makes Backend Operations Explicit

git refs is another newer command in this part of Git.

git refs is a newer command. If git refs -h fails on your machine, stop there and treat the rest of this section as newer-build territory.

Instead of treating ref-backend management as scattered implementation detail, Git now has a command dedicated to repository refs.

The command family is still fairly small. The backend-oriented subcommands that matter most here are:

It also includes list, exists, and optimize.

It means ref storage and migration are increasingly becoming a proper administrative layer rather than remaining buried inside lower-level plumbing or backend-specific lore.

The most important part of git refs is not the existence of a new command name. It is the fact that Git is acknowledging several practical questions directly:

Those are scale questions, not beginner questions.

Migration needs a quiet repository.

Two limits matter immediately: git refs migrate does not work in repositories that have worktrees, and Git cannot block concurrent writes while the migration is running. If scheduled maintenance is active, unregister it first. Treat ref-backend migration like storage maintenance, not like a harmless foreground convenience command.

A safe rehearsal starts with capability and concurrency checks:

git --version
git refs -h
git worktree list
git maintenance unregister --force                # if background maintenance was configured (--force added in recent Git)
git refs verify --strict
git refs migrate --ref-format=reftable --dry-run

If git worktree list shows linked worktrees beyond the current checkout, stop there. If the repository was registered for scheduled maintenance, unregister it before the dry run. The last two commands are the minimum safe rehearsal before a real backend change.

You can inspect the older and newer forms directly:

find .git/refs -type f | sed -n '1,20p'
git for-each-ref --count=10 --format='%(refname)'
test -f "$(git rev-parse --git-path packed-refs)" && sed -n '1,10p' "$(git rev-parse --git-path packed-refs)"

The loose-ref listing shows the old file-per-ref backend directly. for-each-ref shows the visible name set. packed-refs shows whether refs have been compacted into the older flat-file backend.

Ref Growth Gets Real in Agent and Queue Workloads

This is one place where later material in the book meets a very concrete Git subsystem.

If a repository starts producing:

then ref storage becomes a practical scaling topic much earlier than older Git folklore might suggest.

Reftable and git refs matter here because Git needs better handling for higher ref churn and larger ref populations.

It is easy to talk about ref storage and forget reflogs, but reftable is interesting partly because reflogs are part of the same operational layer.

Chapters 2 and 3 treated reflogs as local memory for recovery and safety. At scale, reflogs are also part of the storage and maintenance problem. A backend that thinks about refs and reflogs together is therefore a more coherent direction than treating one as primary data and the other as incidental text files forever.

Refs and reflogs are one operational layer. They belong in the same discussion.

The most important conceptual point is the same one that showed up with objects, packs, and indexes.

Git's meaning stays stable:

What changes is how those names are stored, compacted, enumerated, and verified.

Reftable does not replace the ref model. It replaces the older storage layout around that model.

In large repositories, refs form a scaling layer with their own storage backends, compaction behavior, verification needs, and migration paths. packed-refs was the older answer. Reftable is the newer one. git refs makes that administrative layer more explicit.