High Performance Git

Section IV · Large-Repo Operations, Transport, and Scale

Chapter 14

Clone, Fetch, Push, Protocol v2, and Bundle Seeding

Pencil sketch of a harbor shoreline with docks, pilings, boats, and waterfront buildings.

A slow clone, slow fetch, and slow push all involve the same data (refs, reachability, pack generation, pack transfer), but they do not go wrong the same way.


Clone Is Initial Setup, and Checkout Is Its Own Cost

git clone creates a local repository from a remote source. A typical clone includes repository initialization, negotiation and transfer of objects, ref setup, and a working-tree checkout unless you asked otherwise. Clone latency comes from several independent costs that can each dominate: network round trips, server-side pack generation or bundle seeding, object transfer, local pack registration, and initial checkout cost. The network gets blamed first, but the initial checkout is often its own cost, because after the pack arrives Git may still need to:

So a clone can feel slow even when the transfer itself was reasonable. Several earlier features change different parts of that cost, and they should not be blurred together:

One changes when files are written, one changes which files are written, and one changes which objects arrive at all.

You can separate transfer cost from checkout cost with two traced clones:

GIT_TRACE2_PERF=/tmp/clone-full.perf git clone URL repo
GIT_TRACE2_PERF=/tmp/clone-nocheckout.perf git clone --no-checkout URL repo-nc

The second run still pays for negotiation and object transfer, but it defers working-tree materialization. The gap between those traces is often local checkout work, with wire time mostly subtracted out.

Bundle Seeding Makes Repeated Clones Cheaper

Bundle
Single file containing Git pack data plus ref tips and prerequisite commit IDs. It may be self-contained (no prerequisites) or incremental (requires prerequisite commits in the destination repository).

A git bundle is a Git-native transfer artifact: pack data plus enough ref metadata for Git to verify and apply it. Many clone-heavy environments keep transferring the same base history:

If origin keeps generating or restreaming that same base, you are doing the same work over and over, and a bundle lets you precompute it once and distribute it from somewhere cheaper. The oldest use case was offline transfer:

git bundle create repo.bundle --all
git bundle verify repo.bundle
git clone repo.bundle repo-from-bundle

The more interesting performance use case is clone-time seeding.

Bundle URI
Clone-time mechanism for seeding repository data from a bundle location before normal fetch negotiation continues.

git clone supports --bundle-uri=<uri>, which tells clone to fetch a bundle first and then continue ordinary negotiation with the remote:

git clone \
  --bundle-uri=https://cdn.example.com/repo-main.bundle \
  https://git.example.com/repo.git

The shared base can come from a CDN, object storage, or some other static distribution layer, while origin only serves what is newer than the bundle snapshot.

One practical CI shape is to combine bundle seeding with partial clone:

git clone \
  --bundle-uri=https://cache.example.com/repo-base.bundle \
  --filter=blob:none \
  https://git.example.com/repo.git \
  "$WORKDIR/repo"

That is a good fit when workers repeatedly need almost the same starting point and then disappear.

Bundles help bootstrap, but they do not replace fetch forever. If the bundle is stale, the follow-on fetch can still be large; if clients all need very different history slices, origin still has to do custom work; and if you adopt bundle seeding operationally, someone has to refresh the published base often enough that the remaining fetch stays small.

A plain publisher-side refresh can be as simple as:

git -C /srv/mirrors/repo.git fetch origin --prune --tags
git -C /srv/mirrors/repo.git bundle create /srv/bundles/repo-main.bundle --branches --tags
git bundle verify /srv/bundles/repo-main.bundle

That is the win: cheaper bootstrap, less repeated base transfer, and less direct pressure on origin. After the seed, normal synchronization still happens through fetch.

Fetch Depends More on the Server Than the Client

Fetch is a narrower question than clone. The local repository already exists; the server and client now need to determine which refs the server has, which objects the client already has, and which objects need to be transferred. Fetch performance can be dominated by negotiation and reachability work even before the first byte of new pack data matters much.

Several earlier chapters come back into view. Bitmaps make reachability enumeration cheaper. Commit-graph helps certain graph questions. Pack layout and maintenance affect what the server can reuse. Partial clone changes what it means for the client to "have enough."

Fetch is where repository layout and network behavior meet. Transport performance depends on server maintenance as much as the client machine or network link, because server maintenance changes what the server can answer cheaply.

A server with:

can serve clone and fetch much more efficiently than one with many stale packs and weak acceleration data, which is why transport tuning often turns into repository-layout work on the server side rather than client-side flag chasing. Client-side transport config (protocol.version, fetch.negotiationAlgorithm) can help when negotiation round trips dominate, but no client setting rescues stale server bitmaps or poor pack layout. The Configuration Playbook chapter covers these settings with tradeoff guidance.

When a fetch is mysteriously slow, I check the server side first. Client flags get all the attention because they are what engineers can change, but a server with stale bitmaps or a sprawling pack layout will swamp any client tuning. If you do not control the server, at least confirm that the slow layer is actually transport before you spend time on client config.

Push Is Not Just Fetch in Reverse

Push shares pack transfer machinery with fetch, but the surrounding control flow is different. A push asks the server to accept new objects and then move refs, often under policy such as fast-forward checks, branch protections, update hooks, and server-side integration rules. A slow push can come from object generation and upload, receive-pack validation, ref-update policy, or hooks and integration automation. Push is asymmetric: even with small object volume, ref-update validation can be expensive or blocked.

You can inspect most of that path without moving any remote refs:

git push --dry-run --porcelain origin HEAD
GIT_TRACE_PACKET=1 git push --dry-run origin HEAD
GIT_TRACE2_PERF=/tmp/push.perf git push --dry-run origin HEAD

That exposes the push-side conversation and local timing without actually updating the branch. It also makes the asymmetry plain: push has a validation path fetch does not.

Git hooks are scripts that run at defined points during Git operations. They sit outside the object model and transport machinery, but they can dominate command latency in ways that look like "Git is slow" rather than "our hook is slow."

On the server side, the hooks that matter most are:

A slow pre-receive hook makes every push feel sluggish. A slow post-receive hook can block the client from returning if the hook runs synchronously. Server-side hook latency is often invisible to the pusher unless they trace the conversation.

On the client side, several hooks can affect everyday command speed:

The performance angle is simple: hooks run user-defined code at Git's expense. A pre-commit hook that runs a full lint pass on every tracked file will dominate commit time in a large checkout. A post-checkout hook that rebuilds a dependency cache will dominate branch-switch time. Those costs show up as Git latency, but the fix lives in the hook itself.

When a command feels slow and traces do not explain the time, check whether a hook is running:

ls .git/hooks/
GIT_TRACE2_PERF=/tmp/commit.perf git commit --allow-empty -m 'test hook cost'

Trace2 will show time spent in hook execution as a distinct region. That is often enough to separate "Git is slow" from "our hook is slow."

Pack Transfer, Protocol v2, and Negotiation

The common layer across clone, fetch, and push is pack transport. The server is usually trying to answer a reachability question of the form "which objects must cross the wire?", and the answer is usually a packfile or bundle-like transfer unit. A lot of the earlier performance features show up again here:

Transport performance is one of the places where Git's storage engine surfaces to everyday users.

Protocol v2
Newer Git wire protocol that separates capabilities and commands into a more structured request-response flow.

Protocol v2 breaks one long conversation into more distinct jobs, giving Git separate commands such as ls-refs and fetch instead of one monolithic exchange. Protocol design affects latency as much as raw bandwidth does, and protocol v2 gives Git a command-oriented shape, more structured capability exchange, better support for request-specific behavior, and a design that fits smarter servers and intermediaries more comfortably. It does not make every fetch fast on its own; the clearest way to understand the value is through the separation between two different jobs:

Separating them gives Git more room to optimize how much information is exchanged and when, which pays off most in large repositories and hosting environments where ref sets are large and the client may not need every detail on every round trip.

You can see the split directly:

GIT_PROTOCOL=version=2 git ls-remote --heads origin
GIT_PROTOCOL=version=2 git fetch --negotiate-only --negotiation-tip=HEAD origin
GIT_PROTOCOL=version=2 git fetch --negotiation-tip=HEAD origin

The first command asks only for names, the second asks what common history Git can prove from your current HEAD, and the third performs the real fetch with a narrower negotiation frontier.

Fetch has a simple user-facing form, but the expensive part is still a reachability question underneath: the client and server are trying to avoid sending objects the client already has while still producing a correct pack for what the client needs next.

Fetch gets expensive when:

Bitmaps matter so much to transport for the same reason: they let the server enumerate large reachable object sets much more cheaply than a raw object walk. Split transport into three layers — ref advertisement, packet exchange, and local timing:

git ls-remote origin

GIT_TRACE_PACKET=/tmp/fetch.packet git fetch origin
GIT_TRACE2_PERF=/tmp/fetch.perf git fetch origin

GIT_TRACE2_PERF=/tmp/clone.perf git clone --no-checkout URL repo

ls-remote gives you the advertised name set, packet tracing shows the wire conversation, and Trace2 shows where the local time went before and after the network.

If you want to contrast packet chatter more directly:

GIT_TRACE_PACKET=1 git ls-remote origin
GIT_TRACE_PACKET=1 git fetch origin

And if you want to force the newer protocol framing explicitly:

GIT_PROTOCOL=version=2 git ls-remote origin
GIT_PROTOCOL=version=2 git fetch --negotiation-tip=HEAD origin

Clone, Fetch, and Push Usually Fail for Different Reasons

To review: network problems are easy to lump together, but clone, fetch, and push usually bottleneck in different places.

A slow clone may be about:

A slow fetch may be about:

A slow push may be about:

Keep them distinct.