High Performance Git

Section IV ยท Large-Repo Operations, Transport, and Scale

Chapter 12

Partial Clone and Promisor Remotes

Pencil sketch of a lighthouse rising from a rocky point above rough water.

Sparse-checkout narrows the local checkout. Partial clone narrows the object data Git downloads in the first place.

Those two ideas fit neatly together because they solve different costs. One reduces materialization. The other reduces transfer.

Treat partial clone as partial object transfer, separate from shallow clone's way of shortening visible history.


Partial Clone, Shallow Clone, and Promisor Remotes

Shallow clone shortens how much history you have locally. Partial clone keeps the history model intact while deferring some object payloads.

Many large-repository users want full commit reachability, normal merge-base behavior, and a complete sense of the graph, while still avoiding the cost of downloading every historical blob up front.

So the mental split is:

They may both make a clone smaller, but they do so in very different ways.

Partial Clone
Repository mode where Git intentionally omits some objects during transfer and can fetch them later on demand.

Partial clone exists for repositories where a full up-front transfer is too expensive. Shallow clone was not enough for many of those cases because it changes history depth. Partial clone preserves the commit graph while allowing some object payloads to arrive later, so commands can fetch missing data on demand.

In an ordinary clone, the default assumption is simple: if an object is reachable and should be present locally, then a missing object is a problem.

Partial clone introduces a more nuanced state. Some objects may be omitted intentionally at clone or fetch time, with the expectation that Git can retrieve them later if a command actually needs them.

That changes the meaning of "missing" in an important way. Missing objects are no longer always a sign of corruption. In a partial clone, some missing objects are expected.

Promisor Remote
Remote that promises to provide omitted objects later for a partial clone when Git asks for them.

The mechanism that makes this safe is explicit. Filtered packfiles are marked as promisor packfiles with a .promisor sidecar file. Objects referenced by promisor objects can be absent locally without immediately being treated as corruption, because Git knows they can be requested from a promisor remote later.

That bookkeeping detail is what turns partial clone from a fragile hack into a real repository mode. Git is not merely hoping the missing data exists somewhere. It is tracking that some object absence is intentional and recoverable.

Blobless Clone Is the Common Starting Point

For most source repositories, partial clone usually means blobless clone.

You ask git clone for this with --filter=<filter-spec>, which applies the same object filtering language used by git rev-list. The most important filter for everyday engineering work is:

That tells Git to transfer commits and trees, but defer blob payloads until they are actually needed.

This tends to be the most practical trade because:

So blobless clone preserves a great deal of repository meaning while cutting a large amount of initial transfer.

You can see the setup directly:

git clone --filter=blob:none <url> repo
cd repo

git config --get-regexp '^remote\..*\.(promisor|partialclonefilter)$'
find .git/objects/pack -name '*.promisor'

The remote config shows which remotes are marked as promisors and which filter shaped the clone. The .promisor sidecar files mark packs whose missing referenced objects are allowed to be fetched later.

The git rev-list documentation defines several filter types, even if blob:none remains the default recommendation for most teams.

Useful examples include:

At the aggressive end, tree:0 can produce what is often called a treeless clone. That can make sense in highly scripted or ephemeral environments where the next steps are predictable. It is a much less comfortable default for exploratory human use because Git may need to fetch both tree and blob data before it can answer ordinary path-oriented questions.

Partial clone is not one setting. It is a transfer policy with several choices.

Lazy Fetch, Chatty Workloads, and git backfill

The core runtime behavior is simple: when Git needs a missing object, it can perform a dynamic fetch from the promisor remote.

Partial clone makes the first transfer smaller. It also moves some transfer cost to later commands.

The convenience is obvious:

The tradeoff is that transfer cost does not disappear. It moves.

A blobless clone may feel excellent until a command starts touching historical file content at scale. Then the cost can reappear as many on-demand fetches.

Partial clone is an object-availability feature. From there, command behavior gets easier to predict.

Commands that mostly need commits and trees often work well immediately:

Commands that need file payloads can trigger more fetching:

Partial clone often feels excellent for some workflows and unexpectedly chatty for others for exactly this reason. The repository has not become incomplete in a broken way. It has become demand-driven.

You can feel that difference with a few ordinary commands:

git log -- README.md
git show <older-commit>:path/to/file
git blame path/to/file

git log -- <path> may stay mostly in commit and tree data. git show against older content and git blame are much more likely to need blob payloads, especially when they reach into history that was never hydrated locally.

git backfill
Command that batch-downloads missing blobs for a partial clone ahead of time instead of faulting them in one by one.

git backfill downloads missing blobs in a partial clone and is still marked experimental.

It sits in the middle ground between:

It is useful when you know which historical blobs a task is about to need and would rather batch those reads than wait for a trail of small lazy fetches.

git backfill is most useful when you know a command is about to touch a wider slice of historical content than ordinary lazy fetch would handle gracefully.

If the repository is already sparse, backfill can stay aligned with the worktree's local area of focus. If the main problem is too many small lazy fetches, batching those reads ahead of time helps smooth that out.

It turns a trail of small round trips into one deliberate prefetch.

That is the concrete shape:

git backfill
git backfill --min-batch-size=100000

Without sparse-checkout, the plain command backfills missing blobs reachable from HEAD. --min-batch-size changes how aggressively Git groups those requests. If the repository already uses sparse-checkout, plain git backfill assumes --sparse unless --no-sparse is given, which keeps the request aligned with the current sparse area.

Layered Promisors and Sparse Working Trees

Partial clone also supports multiple promisor remotes. Current docs describe non-extensions.partialClone promisor remotes being tried in configuration order, with the remote named in extensions.partialClone tried last when that compatibility key is present.

That detail is easy to skip over, but it matters in practice. It means Git can support a layered fetch topology rather than assuming one origin must answer every missing-object request directly.

That opens the door to arrangements such as:

This is more operator-oriented than everyday developer advice, but it matters in large installations.

The previous chapter set this up, and this chapter should make the pairing feel concrete.

Sparse-checkout reduces how much of the tracked tree Git materializes into the working directory. Partial clone reduces how much of the object database Git transfers up front.

That means a strong modern pattern often looks like this:

  1. clone with --filter=blob:none
  2. apply sparse-checkout for the task's local working set
  3. enable sparse-index where it helps
  4. use git backfill --sparse if the task is likely to inspect historical content in that same area

That stack works because each feature attacks a different cost:

Once you keep those costs separate, large-repository tuning gets much easier.

Partial clone also benefits from plain config-level guidance instead of abstract praise, but the safest advice is to separate new clones from existing ones.

# Best when you are creating a new clone on purpose:
git clone --filter=blob:none <url> repo
cd repo

For an existing clone, the corresponding future-fetch policy looks like this:

# If an existing clone should use that policy for future fetches:
git config remote.origin.promisor true
git config remote.origin.partialclonefilter blob:none
git fetch --refetch origin

Those lines mean different things, and the why matters:

That is also why partial-clone advice should stay measured. If engineers repeatedly need broad offline history or many immediate blob reads, blob:none may just move the pain into lazy fetches. If the real goal is a smaller clone from the beginning, a fresh blobless clone is usually the honest answer. If the existing clone already has the data and offline completeness matters, leaving it fuller may be the better answer.

What Changes, and What the Tradeoff Really Is

For a long time, the intuitive model was:

Partial clone asks for a different model:

That is a subtle but important change. The repository still has normal commits, trees, refs, and object identities. What changes is when some object bytes arrive.

Git's meaning does not change. Transfer policy changes.

Partial clone is powerful, but the tradeoff should stay visible.

If a command needs an object that was intentionally omitted, Git may need network access to continue. That makes partial clone less self-contained than a fully hydrated clone.

That is usually acceptable when:

It is less attractive when:

Partial clone is best thought of as a policy choice, not as an automatic upgrade for every repository.

When partial clone feels disappointing, the first question is usually:

"Which cost am I actually seeing?"

If the pain is:

It is more useful to ask when and where partial clone makes you pay for object access than to say "partial clone is fast" or "partial clone is slow."

Sparse-checkout narrows the working tree. Partial clone narrows the initial object transfer.

Promisor remotes make delayed object availability normal. Lazy fetch decides when the missing objects show up. git backfill is there when you already know a wider read is coming.

That is the whole shift: not less Git, just fewer object bytes up front.