Git looks like one tool, but it very much is not. Underneath, it is several systems layered together: a content-addressed object store, a filesystem cache, a history graph, and a transfer protocol.
That layered and somewhat haphazard design is why Git performance can be confusing and why this book exists. A slow git status, a slow git log -- path, a large clone, and a noisy fetch are usually different problems.
I wrote this book for engineers who need Git to stay fast as their repositories, histories, and teams get larger: build and CI engineers, monorepo owners, developer-experience teams, and the people who wind up debugging strange Git behavior when easy explanations stop working.
This book keeps coming back to a few ideas:
- Git has a logical model built from blobs, trees, commits, refs, and the index. And it has a performance-sensitive storage layer that supports (but is very different from) that logical model.
- Git performance is a collection of different costs, not one problem with one fix.
- Git is also a good way to study broader systems and infra questions about storage, indexing, caching, transfer, and scale.
The early chapters spend time on Git's logical model, because the later performance discussions only make sense once objects, refs, the index, and history walks are clear. But most of the book's time is spent lower down, in the storage and metadata layers, because that is where Git's performance and scaling tradeoffs become visible. Even if you have used Git for years, the logical-model material should still serve as a useful reset rather than a detour, and much of the storage layer stuff may be new to you.