Chapter 9: Git GC and Maintenance | High Performance Git

Pencil sketch of a shopkeeper sweeping the boardwalk outside a general store.

Repository maintenance has two layers. git gc is Git's cleanup command for the local repository; git maintenance runs and schedules tasks around that cleanup. Keep the split explicit:

git gc as the umbrella cleanup command
git maintenance as the orchestration layer around that work
repacking as a family of strategies, not one monolithic event
cruft packs as the modern unreachable-object path
why repository hygiene is now more continuous and more explicit

Foreground vs. Upkeep

Foreground commands such as git add and git fetch are optimized for responsive user experience, and they do not stop to fully optimize repository data because those optimizations scale with the full repository size while the foreground command is usually doing something much smaller. Foreground Git is trying to finish your immediate action quickly, while maintenance is trying to reshape repository data so future actions get cheaper.

git gc: Git housekeeping command that repacks data, prunes stale state, and updates related repository metadata.

git gc runs a number of housekeeping tasks within the current repository, including compressing file revisions, removing unreachable objects, packing refs, pruning reflogs, cleaning rerere metadata, pruning stale worktrees, and sometimes updating ancillary indexes such as the commit-graph. Common porcelain commands that create objects may run git gc automatically when the repository has grown enough since the last maintenance.

git maintenance: Command that runs and schedules repository optimization tasks such as commit-graph updates, repacking, loose-object cleanup, and related upkeep.

git maintenance runs and schedules maintenance tasks. It can register a repository for background maintenance, start or stop that scheduling, and run individual tasks directly. Within that system, gc is one task among several, and repository upkeep no longer has to happen as one large cleanup pass: different tasks can run on different schedules with different costs. The five subcommands are run, register, start, stop, and unregister. register can set maintenance.strategy=incremental and configure a recommended background schedule, while start wires that into the host scheduler for hourly, daily, and weekly execution.

The split between gc, scheduled maintenance, and repack strategy gets clearer once you run a few of the verbs directly:

git maintenance start
git maintenance run --task=commit-graph
git maintenance run --task=incremental-repack

git gc
git repack -ad --cruft --write-midx

That is enough to see the difference between:

keeping metadata fresh
repacking incrementally
doing a full cleanup pass

You can also inspect the configured posture directly:

git config --get maintenance.strategy
git config --show-origin --get-regexp '^maintenance\.' || true

Incremental Maintenance Is a Posture

The git maintenance task list includes things such as:

commit-graph
prefetch
loose-objects
incremental-repack
pack-refs
reflog expiration
rerere cleanup
worktree pruning
gc

Those are not all doing the same kind of work. Some write metadata. Some fetch ahead of time. Some consolidate loose objects. Some restructure packs. Some clean up old local state.

A lot of that breadth used to get collapsed into "run gc," but Git now lets you run, schedule, and tune parts of that upkeep separately. The gc task still matters: it cleans up unnecessary files and optimizes the local repository, but it is expensive on large repositories because it can rewrite large parts of the object store and delete stale data. gc is the default strategy for manual maintenance, while scheduled maintenance often uses smaller and safer tasks, which is part of why a manual git gc still feels more like a one-off cleanup event than ordinary background maintenance.

One of the more useful modern additions is maintenance.strategy, but the exact strategy names are version-sensitive enough that they should be verified in the Git build you actually deploy. The most portable current posture is:

none when you are not using scheduled maintenance
incremental when you want small recurring tasks instead of disruptive cleanup passes

incremental optimizes for small, data-preserving maintenance activities. It schedules prefetch and commit-graph hourly, loose-objects and incremental-repack daily, and pack-refs weekly.

If your Git build documents additional strategy names, treat them as build-specific operating modes rather than assuming they exist everywhere your team runs Git. Large-repository guidance is much safer when it says "verify the documented strategy set on this machine" than when it hardcodes one vendor-specific answer into policy.

For large repositories, the question is not really "gc or no gc?" It is "which parts of upkeep should happen continuously, and when is a deliberate gc-style cleanup actually warranted?"

For shared repos and CI mirrors, I register background maintenance with incremental and never call git gc by hand. The older pattern of running git gc manually every so often was fine when it was the only tool available, but it tends to turn upkeep into a periodic disruptive event rather than steady background work. A deliberate git gc is still useful for one-off repository cleanup; it is the wrong default for continuous operations.

For a large long-lived developer clone or CI mirror, the first question is whether background maintenance will actually run on that machine.

If the answer is yes, a concrete first posture looks like this:

git maintenance start
git config --get maintenance.strategy
git config --show-origin --get-regexp '^(maintenance\.|gc\.)' || true

Why those settings?

git maintenance start is the important switch because it both registers the repository and installs the scheduler for the current user. If your environment already runs git maintenance run --scheduled some other way, then git maintenance register is the right lighter-weight command instead.
Checking maintenance.strategy is safer than blindly forcing a value. In current Git, start or register will usually choose incremental when the setting was previously unset.
Inspecting the resulting maintenance.* and gc.* config makes the operating mode explicit instead of assuming background upkeep is configured the way you think it is.

If background maintenance is not actually available on the machine, do not disable foreground upkeep just because large-repository advice says "move work into the background." In that case, either keep the default foreground maintenance behavior or schedule explicit git maintenance run --task=... jobs some other way. The important part is that the operating mode is chosen on purpose rather than inherited accidentally.

Instead of waiting for one large cleanup event, Git can:

update commit-graph data incrementally
prefetch remote objects ahead of time
batch loose objects into packs
repack the object store gradually using MIDX-aware strategies

That is a different operating posture, less about occasional dramatic cleanup and more about keeping the repository from drifting into a bad state in the first place. Maintenance increasingly looks like operations work rather than after-the-fact repair, and the individual tasks divide the work in useful ways. The loose-objects task removes loose objects that already exist in packs and then writes batches of loose objects into new loose-* packs, with a default batch size of fifty thousand objects configurable via maintenance.loose-objects.batchSize. The incremental-repack task is different: it uses the multi-pack-index and a two-step process to expire packs no longer referenced by the MIDX and then repack object data incrementally. Loose-object cleanup and pack-structure optimization are related but solve different problems.

Git explicitly warns against enabling both the loose-objects and gc tasks at the same time, because gc can write unreachable objects as loose objects to be cleaned up later, which conflicts with the assumptions of the loose-objects task. Maintenance is a system, and individual commands may make sense in isolation while still composing badly if scheduled together without thought.

Repacking, Cruft Packs, and Retention Policy

git repack changes object layout rather than Git's meaning, and it exposes more strategy choices than many engineers realize. git repack supports:

-d deletes redundant old packs after new ones are written
--cruft writes unreachable objects into a separate cruft pack
--cruft-expiration expires old unreachable objects immediately during that repack
--expire-to writes pruned unreachable objects to another directory as a backup
--geometric=<factor> preserves a geometric pack progression instead of flattening everything
--write-midx writes a multi-pack index for the surviving packs

Together those flags give you a toolkit for shaping storage under different operational goals.

A quick before-and-after inspection often helps:

git count-objects -vH
ls -lh .git/objects/pack
git multi-pack-index verify

Cruft Pack: Separate pack containing unreachable objects so they can be retained and expired more efficiently than as many loose files.

Cruft packs are one of the most important recent changes in Git maintenance. The older picture had unreachable objects sitting around as loose files until they eventually disappeared. --cruft packs them separately and is on by default, so unreachable objects no longer accumulate as loose clutter. git repack extends that model with options like --max-cruft-size, --combine-cruft-below-size, and --expire-to, which make cruft management more incremental and more deliberate.

You can usually see the effect immediately after a cruft-aware repack:

git gc --cruft
ls -lh .git/objects/pack

One of the practical advantages of cruft packs is that they separate retention from immediate deletion.

Git often wants a grace period for unreachable data because users recover from mistakes, interrupted rewrites, bad rebases, and deleted branch names all the time. gc.pruneExpire controls that retention horizon, and the default is two weeks ago.

Think of modern cleanup this way:

unreachable objects are not necessarily kept forever
they are also not assumed to be garbage the moment they lose a ref
maintenance needs a retention policy instead of a simple delete switch