High Performance Git

Section IV ยท Large-Repo Operations, Transport, and Scale

Chapter 15

Bundles and Bundle URIs

Pencil sketch of children playing outdoors in a small coastal town.

Does every clone need to pull the same base data directly from origin? In a lot of environments, the answer is no, and preferably no by a wide margin. Bundles and bundle URIs come about as another way to approach this question.

A bundle is a portable pack plus ref metadata. A bundle URI lets Git seed a clone from that precomputed data before normal fetch negotiation continues. Its original use case was for shipping code around in places like an air-gapped environment. But there are performance possibilities too: a faster, cheaper first transfer path.


Bundles Are Portable Transfer Units, and They Matter Most at Bootstrap

Bundle
Single file containing Git pack data plus ref tips and prerequisite commit IDs. It may be self-contained (no prerequisites) or incremental (requires prerequisite commits in the destination repository).

A git bundle moves objects and refs between repositories without requiring the sender and receiver to be connected directly.

That is the basic look of it. But as we'll see, a bundle is not merely an offline convenience. It is also a way to precompute and redistribute expensive transfer work.

The basic workflow is small:

git bundle create repo.bundle --all
git bundle verify repo.bundle
git clone repo.bundle repo-from-bundle

That sequence shows the important point: a bundle is a Git-native transfer artifact that can be created, verified, and used as clone input.

If the bundle is only the bootstrap source and not the long-term remote, the receiver usually repoints origin and then continues with normal fetch:

git clone repo.bundle repo-from-bundle
git -C repo-from-bundle remote set-url origin https://git.example.com/repo.git
git -C repo-from-bundle fetch origin

That is a useful concrete mental model for the rest of the chapter: seed from the bundle, then reconcile with the canonical remote.

One of the most expensive moments in a repository's life is the first clone. It is repeated:

If origin has to generate or serve the same heavy initial transfer repeatedly, that is wasted effort. A bundle turns that repeated work into something more cacheable and more distributable.

A bundle contains more than just a packfile.

It also carries things such as:

A bundle is a structured transfer artifact Git can reason about.

So bundles can be:

Bundles sit somewhere between plain pack transport and a more fully managed repository mirror.

Full Bundles, Incrementals, and Bundle URIs

There are two main bundle uses.

A full bundle is great for bootstrap:

An incremental bundle is better for follow-on synchronization:

Bundle strategy gets interesting once you have to decide what different clients should start from. Different environments may want different seeding layers rather than one giant universal bundle.

An incremental bundle uses the same mechanism with a narrower range:

git fetch origin main:refs/remotes/origin/main
git bundle create update.bundle main ^origin/main
git bundle verify update.bundle

That is useful when the receiving side already has an older base and only needs a bounded slice of newer history. The fetch makes the prerequisite explicit by ensuring origin/main names the already-shared base before the bundle is created.

On the receiving side, applying that incremental bundle can look like this:

git fetch /path/to/update.bundle main:refs/remotes/bundle/main
git merge --ff-only refs/remotes/bundle/main

That keeps the example honest: an incremental bundle is only useful when sender and receiver already agree on the prerequisite base.

Bundle URI
Clone-time mechanism for seeding repository data from a bundle location before normal fetch negotiation continues.

git clone supports --bundle-uri=<uri>, which tells clone to fetch a bundle from that URI and unbundle the data before talking to the remote in the ordinary way.

This changes the shape of clone.

It means clone can start from:

and then use the origin only for whatever remains.

This changes clone from "always ask origin first" to "bootstrap from wherever distribution is cheapest, then reconcile."

That clone shape is explicit at the command line:

git clone --bundle-uri=https://example.com/repo.bundle <url>

The bundle provides the starting object set. Ordinary clone negotiation with the origin still happens afterward for whatever the bundle did not already provide.

If many clients need roughly the same base object set, then a static bundle is often a much better distribution artifact than asking origin to regenerate or restream equivalent data over and over again.

It is especially attractive when:

A bundle only helps if it is close enough to the state clients actually need.

If the bundle is too old, the follow-on fetch can still be large. At that point you may have moved the work without reducing it much.

In practice, a bundle strategy needs answers to a few plain questions:

So bundles are a distribution and maintenance question as well as a format question.

Refresh is not something clients do to an existing clone in place. In practice, the publisher or hosting side cuts a newer bundle from current refs on some schedule, pushes that bundle to static storage, and updates the bundle URI or bundle list that future clones will consult. Existing clones still use ordinary fetch for whatever changed after their starting point.

That means "refresh" usually looks like an operational loop:

If the delta after seeding keeps growing, the answer is usually to publish a newer base bundle, not to expect origin fetch to somehow become cheaper on its own.

A plain publisher-side refresh might look like this:

git -C /srv/mirrors/repo.git fetch origin --prune --tags
git -C /srv/mirrors/repo.git bundle create /srv/bundles/repo-main.bundle --branches --tags
git bundle verify /srv/bundles/repo-main.bundle

If /srv/bundles/repo-main.bundle is what the advertised bundle URI points at, future clones automatically start from the newer snapshot.

Modern bundle distribution is not limited to one file.

The bundle-format documentation and related bundle-URI support allow lists of bundles and more flexible selection logic. Repositories may want:

At that point, bundle distribution looks less like a one-off export and more like a managed bootstrap strategy.

Bundles Work Best When Many Clients Need Similar Data

Bundles are strongest when many clients need roughly the same base history.

They fit best in:

They help less when every client needs a very different history slice or filter. In that case, origin still has to do more custom work per client.

Suppose a monorepo is large, most engineers start from main, and new-machine setup happens often enough to hurt.

One reasonable pattern is a publisher job that rolls a fresh base bundle on a schedule:

git -C /srv/mirrors/monorepo.git fetch origin --prune --tags
git -C /srv/mirrors/monorepo.git bundle create /srv/bundles/monorepo-main.bundle main --tags
git bundle verify /srv/bundles/monorepo-main.bundle

Then the onboarding path can start from the CDN copy of that bundle instead of asking origin for the whole base history every time:

git clone \
  --bundle-uri=https://cdn.example.com/monorepo-main.bundle \
  https://git.example.com/monorepo.git

That does not remove origin from the picture. It changes what origin has to do. The shared base history comes from the CDN, while origin mostly serves the delta from the bundle snapshot to the current repository state.

If onboarding clones still spend a long time fetching from origin, the bundle is either too old or not the right base for the clients you actually have.

Disposable CI workers are one of the cleanest fits for bundles because they repeatedly need nearly the same starting point and then disappear.

A worker can combine bundle seeding with partial clone:

/usr/bin/time git clone \
  --bundle-uri=https://cache.example.com/repo-base.bundle \
  --filter=blob:none \
  https://git.example.com/repo.git \
  "$WORKDIR/repo"

git -C "$WORKDIR/repo" checkout "$GIT_COMMIT"

That is a practical shape for clone-heavy CI. The worker gets the common history from the bundle, avoids pulling every blob up front, and lets origin focus on the newer commits and any later on-demand object fetches.

The operational question is not just "can CI use bundles?" It is "how often do we have to refresh the base so that worker startup stays predictably cheap?"

If a base bundle is refreshed every few hours and workers still spend too much time in the follow-on network phase, the cadence is wrong. At that point the next move is usually to publish more often or add incrementals, not to keep blaming clone in the abstract.

Bundles, Partial Clone, and the Limits of Seeding

It is one of the better modern combinations in practice.

Earlier chapters showed that partial clone changes how much object data must arrive up front. Bundle distribution changes where that initial data comes from.

Those are complementary ideas.

A repository can:

Large-scale Git increasingly needs this kind of composition.

In command form, that can be as direct as:

git clone \
  --bundle-uri=https://cdn.example.com/repo-main.bundle \
  --filter=blob:none \
  https://git.example.com/repo.git

That shape says: bootstrap the common base from the bundle, then let promisor fetch handle later blob materialization.

The older bundle use cases still matter too:

Bundles are no longer only the offline story. Bundle URI support makes them part of the mainstream online bootstrap story as well.

Keep expectations realistic.

A bundle usually seeds state. It does not replace ordinary synchronization forever. After the seed, the clone still needs normal fetch negotiation for:

It makes more sense to read bundle performance as:

not as:

The bundle is the opening move, not the whole game.

Bundles fit Git well because they remain Git-native artifacts.

The git bundle tooling includes verification paths because a bundle should not be treated as an opaque blob from nowhere. It is still part of the repository transport story, so correctness and compatibility checks remain important.

As bundle distribution moves into more automated bootstrap workflows, verification matters more, not less.

Bundles turn Git bootstrap into something that can be precomputed, cached, distributed, and reused. Bundle URIs take that further by letting clone start from a static or CDN-like source before continuing with ordinary fetch.

The important split is simple: bootstrap can come from a cached distribution layer, while later synchronization still comes from ordinary fetch.