Does every clone need to pull the same base data directly from origin? In a lot of environments, the answer is no, and preferably no by a wide margin. Bundles and bundle URIs come about as another way to approach this question.
A bundle is a portable pack plus ref metadata. A bundle URI lets Git seed a clone from that precomputed data before normal fetch negotiation continues. Its original use case was for shipping code around in places like an air-gapped environment. But there are performance possibilities too: a faster, cheaper first transfer path.
Bundles Are Portable Transfer Units, and They Matter Most at Bootstrap
- Bundle
- Single file containing Git pack data plus ref tips and prerequisite commit IDs. It may be self-contained (no prerequisites) or incremental (requires prerequisite commits in the destination repository).
A git bundle moves objects and refs between repositories without requiring the sender and receiver to be connected directly.
That is the basic look of it. But as we'll see, a bundle is not merely an offline convenience. It is also a way to precompute and redistribute expensive transfer work.
The basic workflow is small:
git bundle create repo.bundle --all
git bundle verify repo.bundle
git clone repo.bundle repo-from-bundle
That sequence shows the important point: a bundle is a Git-native transfer artifact that can be created, verified, and used as clone input.
If the bundle is only the bootstrap source and not the long-term remote, the receiver usually repoints origin and then continues with normal fetch:
git clone repo.bundle repo-from-bundle
git -C repo-from-bundle remote set-url origin https://git.example.com/repo.git
git -C repo-from-bundle fetch origin
That is a useful concrete mental model for the rest of the chapter: seed from the bundle, then reconcile with the canonical remote.
One of the most expensive moments in a repository's life is the first clone. It is repeated:
- by every new developer machine
- by disposable CI or sandbox environments
- by fleet-style ephemeral workers
If origin has to generate or serve the same heavy initial transfer repeatedly, that is wasted effort. A bundle turns that repeated work into something more cacheable and more distributable.
A bundle contains more than just a packfile.
It also carries things such as:
- advertised refs
- prerequisites
- capabilities like object format
A bundle is a structured transfer artifact Git can reason about.
So bundles can be:
- full bootstrap snapshots
- incremental updates
- filtered in ways that match newer clone formulations
Bundles sit somewhere between plain pack transport and a more fully managed repository mirror.
Full Bundles, Incrementals, and Bundle URIs
There are two main bundle uses.
A full bundle is great for bootstrap:
- seed a new clone quickly
- reduce origin load
- populate a cache layer or static mirror
An incremental bundle is better for follow-on synchronization:
- update an already seeded clone
- move a bounded range of history
- stage data ahead of a later fetch
Bundle strategy gets interesting once you have to decide what different clients should start from. Different environments may want different seeding layers rather than one giant universal bundle.
An incremental bundle uses the same mechanism with a narrower range:
git fetch origin main:refs/remotes/origin/main
git bundle create update.bundle main ^origin/main
git bundle verify update.bundle
That is useful when the receiving side already has an older base and only needs a bounded slice of newer history. The fetch makes the prerequisite explicit by ensuring origin/main names the already-shared base before the bundle is created.
On the receiving side, applying that incremental bundle can look like this:
git fetch /path/to/update.bundle main:refs/remotes/bundle/main
git merge --ff-only refs/remotes/bundle/main
That keeps the example honest: an incremental bundle is only useful when sender and receiver already agree on the prerequisite base.
- Bundle URI
- Clone-time mechanism for seeding repository data from a bundle location before normal fetch negotiation continues.
git clone supports --bundle-uri=<uri>, which tells clone to fetch a bundle from that URI and unbundle the data before talking to the remote in the ordinary way.
This changes the shape of clone.
It means clone can start from:
- a CDN
- a cache server
- object storage
- some other static distribution point
and then use the origin only for whatever remains.
This changes clone from "always ask origin first" to "bootstrap from wherever distribution is cheapest, then reconcile."
That clone shape is explicit at the command line:
git clone --bundle-uri=https://example.com/repo.bundle <url>
The bundle provides the starting object set. Ordinary clone negotiation with the origin still happens afterward for whatever the bundle did not already provide.
If many clients need roughly the same base object set, then a static bundle is often a much better distribution artifact than asking origin to regenerate or restream equivalent data over and over again.
It is especially attractive when:
- repositories are large
- new clones are common
- the base history changes more slowly than the latest tips
- cache and CDN infrastructure already exists
A bundle only helps if it is close enough to the state clients actually need.
If the bundle is too old, the follow-on fetch can still be large. At that point you may have moved the work without reducing it much.
In practice, a bundle strategy needs answers to a few plain questions:
- how often is the base bundle refreshed?
- how much fetch work is left after seeding?
- do most clients need the same base history?
- is origin actually serving less data?
So bundles are a distribution and maintenance question as well as a format question.
Refresh is not something clients do to an existing clone in place. In practice, the publisher or hosting side cuts a newer bundle from current refs on some schedule, pushes that bundle to static storage, and updates the bundle URI or bundle list that future clones will consult. Existing clones still use ordinary fetch for whatever changed after their starting point.
That means "refresh" usually looks like an operational loop:
- choose a cutoff point such as a periodic snapshot or a known branch tip
- generate a new base bundle, or add a new incremental bundle to the list
- publish the artifact to object storage, a CDN, or another static distribution layer
- update the advertised bundle URI or bundle list
- measure how much follow-on fetch remains, then decide when to roll the next base bundle
If the delta after seeding keeps growing, the answer is usually to publish a newer base bundle, not to expect origin fetch to somehow become cheaper on its own.
A plain publisher-side refresh might look like this:
git -C /srv/mirrors/repo.git fetch origin --prune --tags
git -C /srv/mirrors/repo.git bundle create /srv/bundles/repo-main.bundle --branches --tags
git bundle verify /srv/bundles/repo-main.bundle
If /srv/bundles/repo-main.bundle is what the advertised bundle URI points at, future clones automatically start from the newer snapshot.
Modern bundle distribution is not limited to one file.
The bundle-format documentation and related bundle-URI support allow lists of bundles and more flexible selection logic. Repositories may want:
- one full base bundle
- several incremental bundles
- filtered or targeted bundles for different clone shapes
At that point, bundle distribution looks less like a one-off export and more like a managed bootstrap strategy.
Bundles Work Best When Many Clients Need Similar Data
Bundles are strongest when many clients need roughly the same base history.
They fit best in:
- developer onboarding
- disposable CI workers
- sandbox environments
- repeated clone-heavy automation
They help less when every client needs a very different history slice or filter. In that case, origin still has to do more custom work per client.
Suppose a monorepo is large, most engineers start from main, and new-machine setup happens often enough to hurt.
One reasonable pattern is a publisher job that rolls a fresh base bundle on a schedule:
git -C /srv/mirrors/monorepo.git fetch origin --prune --tags
git -C /srv/mirrors/monorepo.git bundle create /srv/bundles/monorepo-main.bundle main --tags
git bundle verify /srv/bundles/monorepo-main.bundle
Then the onboarding path can start from the CDN copy of that bundle instead of asking origin for the whole base history every time:
git clone \
--bundle-uri=https://cdn.example.com/monorepo-main.bundle \
https://git.example.com/monorepo.git
That does not remove origin from the picture. It changes what origin has to do. The shared base history comes from the CDN, while origin mostly serves the delta from the bundle snapshot to the current repository state.
If onboarding clones still spend a long time fetching from origin, the bundle is either too old or not the right base for the clients you actually have.
Disposable CI workers are one of the cleanest fits for bundles because they repeatedly need nearly the same starting point and then disappear.
A worker can combine bundle seeding with partial clone:
/usr/bin/time git clone \
--bundle-uri=https://cache.example.com/repo-base.bundle \
--filter=blob:none \
https://git.example.com/repo.git \
"$WORKDIR/repo"
git -C "$WORKDIR/repo" checkout "$GIT_COMMIT"
That is a practical shape for clone-heavy CI. The worker gets the common history from the bundle, avoids pulling every blob up front, and lets origin focus on the newer commits and any later on-demand object fetches.
The operational question is not just "can CI use bundles?" It is "how often do we have to refresh the base so that worker startup stays predictably cheap?"
If a base bundle is refreshed every few hours and workers still spend too much time in the follow-on network phase, the cadence is wrong. At that point the next move is usually to publish more often or add incrementals, not to keep blaming clone in the abstract.
Bundles, Partial Clone, and the Limits of Seeding
It is one of the better modern combinations in practice.
Earlier chapters showed that partial clone changes how much object data must arrive up front. Bundle distribution changes where that initial data comes from.
Those are complementary ideas.
A repository can:
- seed a clone with a bundle
- use partial clone filters for later object policy
- keep origin focused on the delta between the seeded state and the current need
Large-scale Git increasingly needs this kind of composition.
In command form, that can be as direct as:
git clone \
--bundle-uri=https://cdn.example.com/repo-main.bundle \
--filter=blob:none \
https://git.example.com/repo.git
That shape says: bootstrap the common base from the bundle, then let promisor fetch handle later blob materialization.
The older bundle use cases still matter too:
- air-gapped transfer
- intermittent connectivity
- controlled export and import paths
- offline bootstrap
Bundles are no longer only the offline story. Bundle URI support makes them part of the mainstream online bootstrap story as well.
Keep expectations realistic.
A bundle usually seeds state. It does not replace ordinary synchronization forever. After the seed, the clone still needs normal fetch negotiation for:
- new refs
- newer objects
- repository movement beyond the bundle snapshot
It makes more sense to read bundle performance as:
- cheaper bootstrap
- less repeated base transfer
- less direct origin pressure
not as:
- no more fetches
- no more negotiation
The bundle is the opening move, not the whole game.
Bundles fit Git well because they remain Git-native artifacts.
The git bundle tooling includes verification paths because a bundle should not be treated as an opaque blob from nowhere. It is still part of the repository transport story, so correctness and compatibility checks remain important.
As bundle distribution moves into more automated bootstrap workflows, verification matters more, not less.
Bundles turn Git bootstrap into something that can be precomputed, cached, distributed, and reused. Bundle URIs take that further by letting clone start from a static or CDN-like source before continuing with ordinary fetch.
The important split is simple: bootstrap can come from a cached distribution layer, while later synchronization still comes from ordinary fetch.