High Performance Git

Section III ยท Storage and Local Scale

Chapter 10

Git GC and Maintenance

Pencil sketch of a shopkeeper sweeping the boardwalk outside a general store.

Repository maintenance gets easier to read once two ideas are separated.

git gc is Git's cleanup command for the local repository. git maintenance is the layer for running and scheduling maintenance tasks around that cleanup.

Keep the split explicit:


git gc, git maintenance, and the Split Between Foreground Work and Upkeep

Foreground commands such as git add and git fetch are optimized for responsive user experience, and they do not stop to fully optimize repository data because those optimizations scale with the full repository size while the foreground command is usually doing something much smaller.

Foreground Git is trying to finish your immediate action quickly. Maintenance is trying to reshape repository data so future actions get cheaper. Those are related goals, but they are not the same goal.

git gc
Git housekeeping command that repacks data, prunes stale state, and updates related repository metadata.

git gc runs a number of housekeeping tasks within the current repository, including compressing file revisions, removing unreachable objects, packing refs, pruning reflogs, cleaning rerere metadata, pruning stale worktrees, and sometimes updating ancillary indexes such as the commit-graph.

Common porcelain commands that create objects may run git gc automatically when the repository has grown enough since the last maintenance.

gc repacks data, prunes stale state, and updates related metadata across the repository.

git maintenance
Command that runs and schedules repository optimization tasks such as commit-graph updates, repacking, loose-object cleanup, and related upkeep.

git maintenance runs and schedules maintenance tasks. It can register a repository for background maintenance, start or stop that scheduling, and run individual tasks directly.

Within that system, gc is one task among several. Repository upkeep no longer has to happen as one large cleanup pass. Different tasks can run on different schedules with different costs.

git maintenance has five subcommands: run, register, start, stop, and unregister. register can set maintenance.strategy=incremental and configure a recommended background schedule. start wires that into the host scheduler for hourly, daily, and weekly execution.

The split between gc, scheduled maintenance, and repack strategy gets clearer once you run a few of the verbs directly:

git maintenance start
git maintenance run --task=commit-graph
git maintenance run --task=incremental-repack

git gc
git repack -ad --cruft --write-midx

That is enough to see the difference between:

You can also inspect the configured posture directly:

git config --get maintenance.strategy
git config --show-origin --get-regexp '^maintenance\.' || true

Incremental Maintenance Is a Posture, Not One Big Event

The git maintenance task list includes things such as:

Those are not all doing the same kind of work. Some write metadata. Some fetch ahead of time. Some consolidate loose objects. Some restructure packs. Some clean up old local state.

A lot of that breadth used to get collapsed into "run gc." Git now lets you run, schedule, and tune parts of that upkeep separately.

The gc task still matters. It cleans up unnecessary files and optimizes the local repository, while also being expensive on large repositories because it can rewrite large parts of the object store and delete stale data.

gc is the default strategy for manual maintenance. Scheduled maintenance often uses smaller and safer tasks.

That difference is part of why a manual git gc still feels more like a one-off cleanup event than ordinary background maintenance.

One of the more useful modern additions is maintenance.strategy, but the exact strategy names are version-sensitive enough that they should be verified in the Git build you actually deploy.

The most portable current posture is:

incremental optimizes for small maintenance activities that do not delete data. It schedules prefetch and commit-graph hourly, loose-objects and incremental-repack daily, and pack-refs weekly.

If your Git build documents additional strategy names, treat them as build-specific operating modes rather than assuming they exist everywhere your team runs Git. Large-repository guidance is much safer when it says "verify the documented strategy set on this machine" than when it hardcodes one vendor-specific answer into policy.

For large repositories, the question is not really "gc or no gc?" It is "which parts of upkeep should happen continuously, and when is a deliberate gc-style cleanup actually warranted?"

For a large long-lived developer clone or CI mirror, the first question is whether background maintenance will actually run on that machine.

If the answer is yes, a concrete first posture looks like this:

git maintenance start
git config --get maintenance.strategy
git config --show-origin --get-regexp '^(maintenance\\.|gc\\.)' || true

Why those settings?

If background maintenance is not actually available on the machine, do not disable foreground upkeep just because large-repository advice says "move work into the background." In that case, either keep the default foreground maintenance behavior or schedule explicit git maintenance run --task=... jobs some other way. The important part is that the operating mode is chosen on purpose rather than inherited accidentally.

Instead of waiting for one large cleanup event, Git can:

That is a different operating posture. It is less about occasional dramatic cleanup and more about keeping the repository from drifting into a bad state in the first place.

Maintenance increasingly looks like operations work rather than after-the-fact repair.

The maintenance docs make a useful distinction here.

The loose-objects task removes loose objects that already exist in packs and then writes batches of loose objects into new loose-* packs. The default batch size is fifty thousand objects, configurable via maintenance.loose-objects.batchSize.

The incremental-repack task is different. It uses the multi-pack-index and a two-step process to expire packs no longer referenced by the MIDX and then repack object data incrementally.

Loose-object cleanup and pack-structure optimization are related, but they solve different problems.

Git explicitly warns that it is not advisable to enable both the loose-objects and gc tasks at the same time. The reason is that gc can write unreachable objects as loose objects to be cleaned up later, which conflicts with the assumptions of the loose-objects task.

Maintenance is a system. Individual commands may make sense in isolation while still composing badly if scheduled together without thought.

Repacking, Cruft Packs, and Retention Policy

repack is changing object layout rather than Git's meaning.

git repack exposes more strategy choices than many engineers realize:

Together those flags give you a toolkit for shaping storage under different operational goals.

A quick before-and-after inspection often helps:

git count-objects -vH
ls -lh .git/objects/pack
git multi-pack-index verify
Cruft Pack
Separate pack containing unreachable objects so they can be retained and expired more efficiently than as many loose files.

Cruft packs are one of the most important modern changes in Git maintenance.

A common older picture is unreachable objects sitting around as a sea of loose files until they eventually disappear. --cruft packs unreachable objects separately instead of storing them as loose objects, and it is on by default.

With --cruft on by default, current Git already leans away from the older "unreachable becomes loose clutter" story.

git repack extends that model with options like --max-cruft-size, --combine-cruft-below-size, and --expire-to, which make cruft management more incremental and more deliberate.

You can usually see the effect immediately after a cruft-aware repack:

git gc --cruft
ls -lh .git/objects/pack

One of the practical advantages of cruft packs is that they separate retention from immediate deletion.

Git often wants a grace period for unreachable data because users recover from mistakes, interrupted rewrites, bad rebases, and deleted branch names all the time. gc.pruneExpire controls that retention horizon, and the default is two weeks ago.

Read modern cleanup this way:

Cruft packs make that retention policy more manageable.

Old advice lingers here long after the surrounding system changed.

git gc --aggressive throws away existing deltas and recomputes them with much larger search windows, which costs much more time.

It is a special-purpose choice, not a general recommendation whenever a repository feels slow.

If the real problem is too many loose objects, stale commit-graph data, many packs without a MIDX, missing bitmaps, or poor fetch maintenance, then aggressive delta recomputation may be solving the wrong problem expensively.

Keep Repositories Out of Crisis

One practical change is that scheduled background work is now a first-class feature.

The git maintenance start command integrates with platform schedulers such as cron, systemd timers, launchd, or Windows Task Scheduler, depending on the environment.

Repository health degrades gradually:

If maintenance only happens when a human gets annoyed enough to intervene, repositories spend too much of their time in mediocre states. Background maintenance shifts the default.

The right maintenance posture depends on what kind of repository you have.

For a small personal repo, the older manual model may be fine. For a large shared repository, or for many clones of the same large repository across engineers and CI, the maintenance system should be treated more like part of normal operations.

So large-repository guidance increasingly talks about:

These details shape whether Git feels continuously responsive or periodically unhealthy.

git gc is still the umbrella cleanup command.

git maintenance is the newer layer that decides which maintenance tasks run, and when.

For large repositories and machine-heavy environments, that split helps avoid one big disruptive cleanup event.

Packfiles and cruft packs determine physical object layout. Commit-graph, MIDX, and bitmaps depend on upkeep to stay useful. Large-repository workflows benefit when fetch and local storage stay ahead of demand. Recovery depends on retention choices as well as logical history.