High Performance Git

Section IV ยท Large-Repo Operations, Transport, and Scale

Chapter 14

Clone, Fetch, Push, Protocol v2

Pencil sketch of a harbor shoreline with docks, pilings, boats, and waterfront buildings.

A slow clone, slow fetch, and slow push all involve some of the same data:

But they rarely fail for exactly the same reasons.


Clone Is Initial Setup, and Checkout Is Its Own Bill

git clone is how a local repository is created from a remote source.

Break the operation apart.

A typical clone includes:

Clone latency can feel large because it includes several kinds of work:

If any one of those steps is slow, clone slows with it.

It is easy to treat all clone time as network time, but the initial checkout is often a separate bill.

After the pack arrives, Git may still need to:

That means a clone can feel slow even when the transfer was reasonable.

Several earlier features matter here in different ways:

They solve different parts of the problem. One changes when files are written, one changes which files are written, and one changes which objects arrive at all.

You can separate transfer cost from checkout cost with two traced clones:

GIT_TRACE2_PERF=/tmp/clone-full.perf git clone URL repo
GIT_TRACE2_PERF=/tmp/clone-nocheckout.perf git clone --no-checkout URL repo-nc

The second run still pays for negotiation and object transfer, but it defers working-tree materialization. The gap between those traces is often local checkout work, not wire time.

Fetch Is Synchronization, and Server State Usually Matters More Than Client Flags

Fetch is a narrower question than clone.

The local repository already exists. The server and client now need to determine:

Fetch performance can be dominated by negotiation and reachability work even before the first byte of new pack data matters much.

Several earlier chapters come back into view here too:

Fetch is where repository layout and network behavior meet.

Transport performance depends on server maintenance as much as the client machine or network link. Server maintenance changes what the server can answer cheaply.

A server with:

can serve clone and fetch much more efficiently than one with many stale packs and weak acceleration data.

So transport tuning often turns into repository-layout work on the server side, with client flags playing only part of the role.

Client-side transport config (protocol.version, fetch.negotiationAlgorithm) can help when negotiation round trips dominate, but no client setting rescues stale server bitmaps or poor pack layout. The Configuration Playbook chapter covers these settings with tradeoff guidance.

Push Is Not Just Fetch in Reverse

Push shares pack transfer machinery with fetch, but the surrounding control flow is different.

A push asks the server to accept new objects and then move refs, often under policy:

A slow push may come from:

Push often feels more asymmetric than expected for the same reason. Even if the object volume is small, the ref-update path may still be expensive or blocked.

You can inspect most of that path without moving any remote refs:

git push --dry-run --porcelain origin HEAD
GIT_TRACE_PACKET=1 git push --dry-run origin HEAD
GIT_TRACE2_PERF=/tmp/push.perf git push --dry-run origin HEAD

That exposes the push-side conversation and local timing without actually updating the branch. It also makes the asymmetry plain: push has a validation path fetch does not.

Git hooks are scripts that run at defined points during Git operations. They sit outside the object model and transport machinery, but they can dominate command latency in ways that look like "Git is slow" rather than "our hook is slow."

On the server side, the hooks that matter most are:

A slow pre-receive hook makes every push feel sluggish. A slow post-receive hook can block the client from returning if the hook runs synchronously. Server-side hook latency is often invisible to the pusher unless they trace the conversation.

On the client side, several hooks can affect everyday command speed:

The performance angle is simple: hooks run user-defined code at Git's expense. A pre-commit hook that runs a full lint pass on every tracked file will dominate commit time in a large checkout. A post-checkout hook that rebuilds a dependency cache will dominate branch-switch time. Those costs show up as Git latency, but the fix is in the hook, not in Git.

When a command feels slow and traces do not explain the time, check whether a hook is running:

ls .git/hooks/
GIT_TRACE2_PERF=/tmp/commit.perf git commit --allow-empty -m 'test hook cost'

Trace2 will show time spent in hook execution as a distinct region. That is often enough to separate "Git is slow" from "our hook is slow."

Pack Transfer, Protocol v2, and Negotiation

The common layer across clone, fetch, and push is pack transport.

The server is usually trying to answer a reachability question of the form:

The answer is usually a packfile or bundle-like transfer unit. A lot of the earlier performance features show up again here:

Transport performance is one of the places where Git's storage engine becomes visible to everyday users.

Protocol v2
Newer Git wire protocol that separates capabilities and commands into a more structured request-response flow.

Protocol v2 gives Git separate commands such as ls-refs and fetch instead of one long monolithic conversation. That makes the exchange easier to extend and reason about.

Protocol design affects latency as much as raw bandwidth does.

Protocol v2 gives Git a cleaner request structure:

Protocol v2 does not make every fetch fast on its own, but it does make the exchange easier to reason about.

One good way to understand protocol v2 is through the separation between ls-refs and fetch.

Those are different jobs:

Separating them gives Git more room to optimize how much information is exchanged and when. That is especially useful in large repositories and hosting environments where ref sets are large, servers are smart, and the client may not need every possible detail on every round trip.

Protocol v2 keeps the same repository model but gives the conversation a cleaner structure.

You can see the split directly:

GIT_PROTOCOL=version=2 git ls-remote --heads origin
GIT_PROTOCOL=version=2 git fetch --negotiate-only --negotiation-tip=HEAD origin
GIT_PROTOCOL=version=2 git fetch --negotiation-tip=HEAD origin

The first command asks only for names. The second asks what common history Git can prove from your current HEAD. The third performs the real fetch with a narrower negotiation frontier.

Fetch has a simple user-facing form, but the expensive part is still a reachability question underneath.

The client and server are trying to avoid sending objects the client already has while still producing a correct pack for what the client needs next.

Fetch gets expensive when:

Bitmaps matter so much to transport for the same reason. They let the server enumerate large reachable object sets much more cheaply than a raw object walk.

Transport gets easier to reason about once you split the conversation into ref advertisement, packet exchange, and local timing:

git ls-remote origin

GIT_TRACE_PACKET=/tmp/fetch.packet git fetch origin
GIT_TRACE2_PERF=/tmp/fetch.perf git fetch origin

GIT_TRACE2_PERF=/tmp/clone.perf git clone --no-checkout URL repo

ls-remote gives you the visible name set. Packet tracing shows the wire conversation. Trace2 shows where the local time went before and after the network.

If you want to contrast packet chatter more directly:

GIT_TRACE_PACKET=1 git ls-remote origin
GIT_TRACE_PACKET=1 git fetch origin

And if you want to force the newer protocol framing explicitly:

GIT_PROTOCOL=version=2 git ls-remote origin
GIT_PROTOCOL=version=2 git fetch --negotiation-tip=HEAD origin

Clone, Fetch, and Push Usually Fail for Different Reasons

Network problems are easy to lump together, but clone, fetch, and push usually bottleneck in different places.

A slow clone may be about:

A slow fetch may be about:

A slow push may be about:

Those are related problems, but they are not interchangeable.

Bundles and bundle URIs change how clones can be seeded so origin servers do less repeated work. Large ref sets and ref backends matter here too, because ref count and ref lookup are part of the transport path. Clone, fetch, and push all move Git data over the network, but they are not one generic "network performance" problem. Clone is initial setup. Fetch is synchronization. Push is object transfer plus ref-update policy. Protocol v2 gives those conversations a cleaner structure, while pack reuse, bitmaps, filters, and maintenance determine how expensive the underlying object movement becomes.

Once you keep those layers separate, transport performance becomes much easier to reason about.