Chapter 11: Partial Clone and Promisor Remotes

Pencil sketch of two people watching a lighthouse across rough water from the shoreline.

Sparse-checkout narrows the local checkout. Partial clone narrows the object data Git downloads in the first place. They pair well: one reduces materialization, the other reduces transfer. Treat partial clone as partial object transfer, separate from shallow clone's way of shortening apparent history.

Partial Clone, Shallow Clone, and Promisor Remotes

Many large-repository users want full commit reachability, normal merge-base behavior, and a complete sense of the graph while still avoiding the cost of downloading every historical blob up front. The split is:

shallow clone reduces history depth
partial clone reduces transferred object content

Both can make a clone smaller, in very different ways.

Partial Clone: Repository mode where Git intentionally omits some objects during transfer and can fetch them later on demand.

Partial clone exists for repositories where a full up-front transfer is too expensive. Shallow clone was not enough for many of those cases because it changes history depth. Partial clone preserves the commit graph while allowing some object payloads to arrive later, so commands can fetch missing data on demand.

In an ordinary clone, the default assumption is simple: if an object is reachable and should be present locally, a missing object is a problem. Partial clone introduces a more nuanced state where some objects may be omitted intentionally at clone or fetch time, with the expectation that Git can retrieve them later if a command actually needs them. Missing objects are therefore no longer always a sign of corruption; in a partial clone, some missing objects are expected.

Promisor Remote: Remote that promises to provide omitted objects later for a partial clone when Git asks for them.

The mechanism that makes this safe is explicit. Filtered packfiles are marked as promisor packfiles with a .promisor sidecar file. Objects referenced by promisor objects can be absent locally without immediately being treated as corruption, because Git knows they can be requested from a promisor remote later.

That bookkeeping detail is what turns partial clone from a fragile hack into a real repository mode. Git is tracking that some object absence is intentional and recoverable, with explicit metadata behind every omitted object.

Blobless Clone Is the Common Starting Point

For most source repositories, partial clone usually means blobless clone. You ask git clone for this with --filter=<filter-spec>, which applies the same object filtering language used by git rev-list. The most important filter for everyday engineering work is --filter=blob:none, which tells Git to transfer commits and trees but defer blob payloads until they are actually needed. That tends to be the most practical trade because:

commits and trees are enough to reason about history shape
tree data is enough to reason about path structure
historical blobs often dominate transfer volume

Blobless clone preserves a great deal of repository meaning while cutting a large amount of initial transfer.

You can see the setup directly:

git clone --filter=blob:none <url> repo
cd repo

git config --get-regexp '^remote\..*\.(promisor|partialclonefilter)$'
find .git/objects/pack -name '*.promisor'

The remote config shows which remotes are marked as promisors and which filter shaped the clone. The .promisor sidecar files mark packs whose missing referenced objects are allowed to be fetched later.

The git rev-list documentation defines several filter types, even if blob:none remains the default recommendation for most teams.

Useful examples include:

blob:limit=<n> to omit blobs above a size threshold
tree:<depth> to limit how much tree structure is transferred

At the aggressive end, tree:0 produces what is often called a treeless clone. That can make sense in highly scripted or ephemeral environments where the next steps are predictable, but it is a much less comfortable default for exploratory human use because Git may need to fetch both tree and blob data before it can answer ordinary path-oriented questions. Partial clone is a transfer policy with several choices, not a single switch.

Lazy Fetch, Chatty Workloads, and `git backfill`

The core runtime behavior is simple: when Git needs a missing object, it can perform a dynamic fetch from the promisor remote. That makes the first transfer smaller, but it also moves some transfer cost to later commands. The convenience is obvious:

smaller initial clones
less up-front transfer
less time transferring objects you may never read

The tradeoff is that transfer cost moves rather than disappears. A blobless clone may feel excellent until a command starts touching historical file content at scale, at which point the cost can reappear as many on-demand fetches.

Commands that mostly need commits and trees often work well immediately:

many history walks
ref operations
path-structure inspection

Commands that need file payloads can trigger more fetching:

checkout of paths whose blobs are absent
show or diff against older content
history commands that inspect many historical file versions
blame in parts of history whose blobs were never transferred

Partial clone often feels excellent for some workflows and unexpectedly chatty for others. The repository has not become incomplete in a broken way; objects fetch on demand. Treat partial clone as an object-availability feature and command behavior becomes predictable.

You can feel that difference with a few ordinary commands:

git log -- README.md
git show <older-commit>:path/to/file
git blame path/to/file

git log -- <path> may stay mostly in commit and tree data. git show against older content and git blame are much more likely to need blob payloads, especially when they reach into history that was never hydrated locally.

git backfill: Command that batch-downloads missing blobs for a partial clone ahead of time instead of faulting them in one by one.

git backfill downloads missing blobs in a partial clone and is still marked experimental.

It sits in the middle ground between:

fetching everything
or waiting for commands to fault individual blobs in one by one

It is useful when you know which historical blobs a task is about to need and would rather batch those reads than wait for a trail of small lazy fetches.

git backfill is most useful when you know a command is about to touch a wider slice of historical content than ordinary lazy fetch would handle gracefully.

If the repository is already sparse, backfill can stay aligned with the worktree's local area of focus. If the main problem is too many small lazy fetches, batching those reads ahead of time helps smooth that out.

It turns a trail of small round trips into one deliberate prefetch.

That is the concrete shape:

git backfill
git backfill --min-batch-size=100000

Without sparse-checkout, the plain command backfills missing blobs reachable from HEAD. --min-batch-size changes how aggressively Git groups those requests. If the repository already uses sparse-checkout, plain git backfill assumes --sparse unless --no-sparse is given, which keeps the request aligned with the current sparse area.

Layered Promisors and Sparse Working Trees

Partial clone also supports multiple promisor remotes. Current docs describe non-extensions.partialClone promisor remotes being tried in configuration order, with the remote named in extensions.partialClone tried last when that compatibility key is present.

Git can support a layered fetch topology rather than assuming one origin must answer every missing-object request directly. That opens the door to arrangements such as a nearer cache or mirror as one promisor remote with the primary origin as another.

Remember: sparse-checkout reduces how much of the tracked tree Git materializes into the working directory, while partial clone reduces how much of the object database Git transfers up front.

That means a strong modern pattern often looks like this:

clone with --filter=blob:none
apply sparse-checkout for the task's local working set
enable sparse-index where it helps
use git backfill --sparse if the task is likely to inspect historical content in that same area

That stack works because each feature attacks a different cost:

transfer
working-tree size
index size
lazy historical blob fetches

Once you keep those costs separate, large-repository tuning gets much easier.

Partial clone also benefits from plain config-level guidance instead of abstract praise, but the safest advice is to separate new clones from existing ones.

# Best when you are creating a new clone on purpose:
git clone --filter=blob:none <url> repo
cd repo

For an existing clone, the corresponding future-fetch policy looks like this:

# If an existing clone should use that policy for future fetches:
git config remote.origin.promisor true
git config remote.origin.partialclonefilter blob:none
git fetch --refetch origin

Those lines mean different things, and the why matters:

git clone --filter=blob:none is the cleanest way to get a smaller initial transfer and a repository that is partial from the start.
remote.origin.promisor=true tells Git that missing objects may be fetched from that remote later. That is the core policy change that makes omitted objects normal instead of repository corruption.
remote.origin.partialclonefilter=blob:none is the concrete "most history walks do not need every blob up front" policy. Use it when transfer volume is what hurts and most tasks do not immediately need broad historical file contents.
git fetch --refetch origin matters after changing the filter on an existing clone because the new filter only affects future fetches by default. It reapplies the filter to a fresh transfer, but it does not retroactively dehydrate blobs that are already present in the local object store.

I try to stay measured about partial-clone advice. If engineers repeatedly need broad offline history or many immediate blob reads, blob:none may just move the pain into lazy fetches. If the real goal is a smaller clone from the beginning, a fresh blobless clone is usually the honest answer. If the existing clone already has the data and offline completeness matters, leaving it fuller may be the better answer.

What Changes, and What the Tradeoff Really Is

For a long time, the intuitive model was that a local clone equals a locally complete object store. Partial clone asks for a different one: a local clone equals a locally sufficient object store, with more available on demand.

That is a subtle but important change. The repository still has normal commits, trees, refs, and object identities. What changes is when some object bytes arrive. Git's meaning does not change; transfer policy does.

The tradeoff stays apparent. If a command needs an object that was intentionally omitted, Git may need network access to continue — that makes partial clone less self-contained than a fully hydrated clone. It is usually acceptable when the repository is very large, the network is reasonably available, and most tasks only touch a subset of history. It is less attractive when offline work is common, historical inspection must be instant and broad, or the workflow depends on deep local completeness more than transfer savings.

I treat partial clone as a policy choice, not as an automatic upgrade for every repository.

When partial clone feels disappointing, the first question is "which cost am I actually seeing?" If the pain is:

initial clone size, partial clone may help
unexpected lazy fetch latency, git backfill or a broader filter may help
local checkout and index cost, sparse-checkout and sparse-index belong here instead
lack of offline completeness, the repository may need fuller hydration for that workflow

It is more useful to ask when and where partial clone makes object access more expensive than to say "partial clone is fast" or "partial clone is slow."

Sparse-checkout narrows the working tree. Partial clone narrows the initial object transfer. Promisor remotes make delayed object availability normal, lazy fetch decides when the missing objects show up, and git backfill is there when you already know a wider read is coming.

Partial Clone, Shallow Clone, and Promisor Remotes

Blobless Clone Is the Common Starting Point

Lazy Fetch, Chatty Workloads, and git backfill

Layered Promisors and Sparse Working Trees

What Changes, and What the Tradeoff Really Is

Lazy Fetch, Chatty Workloads, and `git backfill`