High Performance Git

Section IV · Large-Repo Operations, Transport, and Scale

Chapter 13

Scalar, Prefetch, Large Repositories

Pencil sketch of two people watching a lighthouse across rough water from the shoreline.

We've learned the main large repository tools. An operational question now: how do you put those pieces together without hand-tuning every repository from scratch?

scalar is one answer. It packages a set of large-repository defaults and background tasks without changing Git's core model.

Availability varies by distribution. Some Git builds package Scalar separately or do not ship it at all, so confirm either scalar -h or git help scalar works on your machine before standardizing on it.


Scalar Is a Bundle Around Existing Git Features

Scalar is not a separate version-control system or a parallel Git implementation. It is an operational bundle around core Git features such as sparse-checkout, partial clone, and background maintenance.

Scalar grew out of Microsoft's VFS for Git work, and it combines several of Git's large-repository features into one repeatable local setup.

Scalar
Git repository management tool that configures and maintains large repositories using an opinionated bundle of features and defaults.

Earlier chapters covered individual mechanisms:

Scalar pulls several of those into one tool with commands such as:

The command set is small:

scalar clone <url>
scalar register
scalar reconfigure

Scalar is a repeatable way to clone, register, and maintain a repository with a large-repo-oriented configuration bundle. That is less romantic than inventing a whole new system, but it is usually more useful.

Clone Shape and Enlistment Change the Local Experience

scalar clone shows the pattern. By default, Scalar clones only commit and tree objects, enables sparse-checkout unless --full-clone is used, and initially materializes only files in the top-level directory. Scalar starts with a narrower local formulation instead of assuming a full dense checkout is the obvious first step.

It means the first user experience of a very large repository can be closer to:

instead of:

Enlistment
Scalar's top-level project directory, which usually contains the Git worktree in a src/ subdirectory.

Scalar also introduces a concept that is more important than it first appears: the enlistment.

By default, Scalar places the worktree under an src/ directory inside a larger enlistment directory. That encourages separation between tracked files inside src/ and untracked files such as build artifacts outside src/.

Large repositories often struggle with:

The enlistment model gives those things a cleaner local boundary. It reflects the same theme as worktrees and sparse-checkout: arrange the local environment to reduce unnecessary cost and confusion. Sometimes ergonomics is just putting the mess in the right room.

Background Maintenance and Prefetch Are Part of the Product

By default, scalar clone and scalar register configure background maintenance unless --no-maintenance is used. scalar run fetch maps to the prefetch maintenance task, while scalar run pack-files maps to incremental-repack.

Scalar treats maintenance as part of the default setup, not as a rescue move for already-unhealthy repositories.

This matches the earlier maintenance chapter. At scale, Git works better when it:

Scalar packages that approach into a tool.

Prefetch
Maintenance task that fetches objects into refs/prefetch/ ahead of time so later foreground fetches need less transfer work.

The prefetch task fetches from registered remotes, stores the results under refs/prefetch/, and avoids moving ordinary remote-tracking branches. The goal is to gather the objects needed for a later real fetch without surprising the user by moving the refs they watch every day.

Users notice refs moving. Prefetch says:

"Download the objects for a future fetch now, but do not move the remote-tracking refs until the user runs a normal fetch."

Git is separating object transfer from ref movement here. The objects can arrive earlier. The visible branch updates can wait for the foreground fetch the user actually asked for. That split is small, but it is the whole trick.

You can inspect that directly:

scalar run fetch
# or, without Scalar installed:
git fetch --prefetch origin
git for-each-ref refs/prefetch --format='%(refname)'

The first command runs the prefetch task through Scalar. The second shows the same ref-shaping behavior with plain Git. In either case, the final command shows that the fetched refs landed under refs/prefetch/ instead of advancing the ordinary remote-tracking refs the user watches day to day.

The incremental strategy schedules:

The schedule shows how Git wants large repositories to behave over time. Prefetch is part of the normal rhythm of keeping a repository warm. It is especially useful in monorepos where many developers pull frequently, object transfer remains significant, and foreground fetch latency is visible in daily work.

If prefetch has already brought most of the needed objects down, the next foreground fetch can be much closer to ref update and negotiation bookkeeping than to a large transfer event.

Prefetch stores fetched refs under refs/prefetch/, and Scalar recommends excluding refs/prefetch/* from log decoration. That keeps two views separate:

That separation makes prefetch practical. You get the transfer benefit without a bunch of noisy extra decorations in ordinary history views.

Scalar's Config Bundle Is Opinionated for Good Reason

Scalar includes a long section of recommended config values. Together they form a coherent large-repository philosophy rather than a bag of toggles.

Examples:

Taken together, the defaults do a few consistent things:

If you want to see that bundle as configuration instead of prose:

git config --list | rg '^(maintenance\\.|fetch\\.|index\\.|status\\.)'

Scalar gets easier to inspect once you look at the config directly. The defaults are mostly ordinary Git settings arranged to keep large repositories usable in the foreground.

If you want to apply a similar posture by hand in a plain Git clone, the Configuration Playbook chapter covers each setting with version caveats and compatibility guidance. Scalar's config bundle matters not because it is "a lot of Git settings," but because it is a coherent answer to which costs belong in the background and which costs should stay off the foreground path entirely.

Scalar Is a Better Way to Talk About Monorepo Ergonomics

Too much monorepo advice is still phrased in sweeping cultural terms: monorepos are good, monorepos are bad, Git can handle them, Git cannot handle them.

Scalar turns the question into concrete details:

It makes large-repository setup less ideological and more concrete. Thank God.

Scalar does not make sparse-checkout, partial clone, maintenance, or index tuning irrelevant. It is a way of composing them.

Scalar Is Especially Useful When Repositories Are Shared and Long-Lived

Some repositories are small and local enough that a hand-tuned setup is perfectly fine. Scalar matters more when the repository is:

In that world, one bundled setup has real value. It gives teams a more repeatable answer to:

"How should this repo be cloned, maintained, and operated locally?"

That beats leaving every engineer to rediscover the same performance lessons on their own machine. A lot of “local preference” is just distributed suffering.

What Scalar Bundles

Large-repository usability rarely comes from one heroic Git feature. It comes from a bundle of choices that reduce transfer, narrow checkout scope, keep metadata fresh, and move expensive upkeep out of the critical path. Scalar packages that bundle into one tool.

When the setup still feels mysterious, scalar diagnose is a useful final command to keep in mind:

scalar diagnose

It is outside the steady-state path, but it fits the same philosophy: large-repository operation should be inspectable, supportable, and fast when nothing is wrong.