Chapter 0: Introduction | High Performance Git

Pencil sketch of a harbor village with a dock, several workers, and small boats on calm water.

Git looks like one tool, but underneath it is several systems layered together: a content-addressed object store, a filesystem cache, a history graph, and a transfer protocol.

That layered and somewhat haphazard design is why Git performance can be confusing and why this book exists. A slow git status, a slow git log -- path, a large clone, and a noisy fetch are usually different problems.

I wrote this book for engineers who need Git to stay fast as their repositories, histories, and teams get larger: build and CI engineers, monorepo owners, developer-experience teams, and the people who wind up debugging strange Git behavior in the dark.

A few ideas come back throughout:

Git has a logical model built from blobs, trees, commits, refs, and the index. And it has a separate storage layer that supports (but is very different from) that logical model.
Git performance is a collection of different costs, each with its own fix.
Git is also a good way to study broader systems and infra questions about storage, indexing, caching, transfer, and scale.

The early chapters spend time on Git's logical model, because the later performance discussions only make sense once objects, refs, the index, and history walks are clear. But most of the book's time is spent lower down, in the storage and metadata layers, because that is where Git's performance and scaling tradeoffs show up. Even if you have used Git for years, the logical-model material should still serve as a useful reset rather than a detour, and much of the storage layer stuff may be new to you.