Chapter 12: Scalar, Prefetch, Large Repositories

Pencil sketch of a quiet harbor with docks, small boats, and a lighthouse across the water.

We've learned the main large repository tools. An operational question now: how do you put those pieces together without hand-tuning every repository from scratch? scalar is one answer. It packages a set of large-repository defaults and background tasks without changing Git's core model.

Availability varies by distribution. Some Git builds package Scalar separately or do not ship it at all, so confirm either scalar -h or git help scalar works before standardizing on it.

Scalar Is a Bundle Around Existing Git Features

Scalar runs on top of stock Git rather than alongside it as a separate version-control system. What it bundles is mostly defaults: sparse-checkout enabled, partial clone enabled, background maintenance scheduled, a few important config keys set, and a couple of command names that wrap the underlying machinery so a team can adopt a posture without each engineer reproducing it from scratch.

It grew out of Microsoft's VFS for Git work and got thinner over time as the underlying Git features matured. Today it is closer to a curated git config set with a few wrapper commands than it is to a separate tool.

Scalar: Git repository management tool that configures and maintains large repositories using an opinionated bundle of features and defaults.

Earlier chapters covered the individual mechanisms: sparse working trees, filtered clone, background maintenance, commit-graph updates, repack strategy, index tuning. The Scalar surface area pulls those into a small command set:

scalar clone <url>
scalar register
scalar reconfigure
scalar run
scalar diagnose

Clone Shape and Enlistment Change the Local Experience

By default, Scalar clones only commit and tree objects, enables sparse-checkout unless --full-clone is used, and initially materializes only files in the top-level directory. Scalar starts with a narrower local formulation instead of assuming a full dense checkout is the obvious first step.

The first user experience of a very large repository becomes "get the graph and tree layout, materialize a small working area, expand deliberately as needed" rather than "transfer everything, materialize everything, discover later that the repository is too large to use comfortably."

Enlistment: Scalar's top-level project directory, which usually contains the Git worktree in a src/ subdirectory.

Scalar also introduces a concept that is more important than it first appears: the enlistment.

By default, Scalar places the worktree under an src/ directory inside a larger enlistment directory. That encourages separation between tracked files inside src/ and untracked files such as build artifacts outside src/.

Large repositories often struggle with huge build outputs, many untracked directories, local tooling state, and confusion about what belongs in the repository versus adjacent to it. The enlistment model gives those things a cleaner local boundary. It reflects the same theme as worktrees and sparse-checkout: arrange the local environment to reduce unnecessary cost and confusion. Sometimes ergonomics is just putting the mess in the right room.

Background Maintenance and Prefetch Are Part of the Product

By default, scalar clone and scalar register configure background maintenance unless --no-maintenance is used. scalar run fetch maps to the prefetch maintenance task, while scalar run pack-files maps to incremental-repack.

Scalar treats maintenance as part of the default setup, not as a rescue move for already-unhealthy repositories.

This matches the earlier maintenance chapter. At scale, Git works better when it keeps accelerators fresh, keeps object layout healthy, keeps transfer cheaper for later foreground commands, and avoids giant cleanup events when possible. Scalar packages that approach into a tool.

Prefetch: Maintenance task that fetches objects into refs/prefetch/ ahead of time so later foreground fetches need less transfer work.

The prefetch task fetches from registered remotes, stores the results under refs/prefetch/, and avoids moving ordinary remote-tracking branches. The goal is to gather the objects needed for a later real fetch without surprising the user by moving the refs they watch every day.

Users notice refs moving. Prefetch says:

"Download the objects for a future fetch now, but do not move the remote-tracking refs until the user runs a normal fetch."

Git is separating object transfer from ref movement. The objects can arrive earlier. The user-facing branch updates can wait for the foreground fetch the user actually asked for.

You can inspect that directly:

scalar run fetch
# or, without Scalar installed:
git fetch --prefetch origin
git for-each-ref refs/prefetch --format='%(refname)'

The first command runs the prefetch task through Scalar. The second shows the same ref-shaping behavior with plain Git. In either case, the final command shows that the fetched refs landed under refs/prefetch/ instead of advancing the ordinary remote-tracking refs the user watches day to day.

The incremental strategy schedules:

commit-graph hourly
prefetch hourly
loose-objects daily
incremental-repack daily

The schedule shows how Git wants large repositories to behave over time. Prefetch is part of the normal rhythm of keeping a repository warm. It is especially useful in monorepos where many developers pull frequently, object transfer remains significant, and foreground fetch latency shows up in daily work. If prefetch has already brought most of the needed objects down, the next foreground fetch can be much closer to ref update and negotiation bookkeeping than to a large transfer event.

Prefetch stores fetched refs under refs/prefetch/, and Scalar recommends excluding refs/prefetch/* from log decoration. That keeps two views separate:

the object store may already be warmed with future data
the refs the user looks at every day should still reflect deliberate foreground operations

Scalar's Config Bundle Is Opinionated for Good Reason

Scalar includes a long section of recommended config values. Together they form a coherent large-repository philosophy rather than a bag of toggles.

Examples:

commitGraph.changedPaths=true so background commit-graph writes include changed-path Bloom filters
fetch.unpackLimit=1 so fetched packs stay packed instead of spilling into loose objects
fetch.writeCommitGraph=false because Scalar prefers background maintenance rather than spending foreground fetch time there (this looks counterintuitive, but the key is that Scalar's hourly background commit-graph task handles it instead, so foreground fetches stay fast while the graph still gets updated)
gc.auto=0 because background maintenance replaces ad hoc automatic garbage collection, but only once that background path is really enabled
index.version=4 and index.skipHash=true to keep large indexes cheaper to write and store, but that compatibility trade only works cleanly when every Git touching the checkout is recent enough
status.aheadBehind=false to avoid an often-expensive ahead/behind calculation during git status

Taken together, the defaults do a few consistent things: reduce foreground surprise, keep large local data structures cheaper, move expensive upkeep into background tasks, and trade some flexibility for a more usable large-repo setup. If you want to see that bundle as configuration instead of prose:

git config --list | rg '^(maintenance\.|fetch\.|index\.|status\.)'

Most of Scalar is ordinary Git settings arranged to keep large repositories usable in the foreground.

If you want to apply a similar posture by hand in a plain Git clone, the Configuration Playbook chapter covers each setting with version caveats and compatibility guidance. Scalar's config bundle matters not because it is "a lot of Git settings," but because it is a coherent answer to which costs belong in the background and which costs should stay off the foreground path entirely.

Scalar Is a Better Way to Talk About Monorepo Ergonomics

Too much monorepo advice is still phrased in sweeping cultural terms: monorepos are good, monorepos are bad, Git can handle them, Git cannot handle them. Scalar turns the question into concrete details:

how much is cloned up front?
how much is checked out?
what maintenance runs automatically?
which foreground commands are allowed to stay expensive?
which background tasks are allowed to prepare work in advance?

That makes large-repository setup less ideological and more concrete, and it does so by composing sparse-checkout, partial clone, maintenance, and index tuning into a coherent whole rather than replacing any of them.

When Scalar Is Worth It

Some repositories are small and local enough that a hand-tuned setup is perfectly fine. I reach for Scalar when the repository is large, long-lived, shared across many developers, provisioned repeatedly on workstations, and sensitive to small latency costs in status, fetch, and checkout behavior. There, one bundled setup avoids reproducing the same tuning across the team. It gives teams a more repeatable answer to: "How should this repo be cloned, maintained, and operated locally?"

What Scalar Bundles

Large-repository usability is a bundle: reduce transfer, narrow checkout scope, keep metadata fresh, move expensive upkeep out of the critical path. Scalar packages that bundle into one tool. When the setup still seems off, scalar diagnose can help:

scalar diagnose