Git history is not like a saved movie. It's not a file that lists commits, or anything like that. It is a graph Git walks every time you ask a question.
git log, git show, git blame, git bisect, and git merge-base all start from the same basic move: pick commits, walk parent links, and keep the results that match the question you asked.
This chapter takes that in three layers:
- how Git names commits and ranges
- how Git walks the graph
- how Git computes path-limited history on top of that walk in commands like
git log -- path,git rev-list -- path, andgit blame
Revisions Are Expressions, Not Just Commit IDs
In everyday use, a "revision" often means "some commit I want to refer to," but Git's revision syntax is (for better or worse) broader than that. A revision expression can name a ref, a commit ID, a relative ancestor, a set difference, or a merge-related relationship that Git has to compute. You've seen those tildes and carets in revisions (and likely forgot what they mean); well that's this.
- Revision
- Expression that Git resolves to one or more commits or objects.
If revision syntax feels dense at first, that is because Git is giving you a compact language for describing commit sets rather than merely pointing at one object.
HEADmeans the current commitmainmeans the commit named by themainrefa1b2c3dmeans the commit with that object ID prefixHEAD^means the first parent ofHEADHEAD~3means the first-parent ancestor three steps backmerge^2means the second parent of a merge commit
The important distinction is that ^ picks a specific parent edge, while ~ stays on the first-parent line and keeps walking. So HEAD~3 and HEAD^^^ often land on the same commit, while merge~1 and merge^2 can point in completely different directions.
The key point is that revisions are expressions (not mere IDs) that eventually resolve. There is another important symbol to cover, which adds to the density.
The Separator To Reduce Ambiguity
One of Git's smallest lifesavers is also one of the easiest to overlook: --.
- Pathspec
- Git path-matching expression used to limit a command to particular files, directories, or patterns.
In commands such as git log, git show, or git diff, Git has to distinguish revision arguments from path arguments. A revision argument is a particular kind of expression.
The -- marker tells Git that everything after it should be treated as a pathspec rather than as another revision.
A pathspec is Git's language for saying which parts of the tree you mean. Sometimes that is a literal path such as src/main.c. Sometimes it is a directory such as docs/. Sometimes it is a pattern. A revision changes where the history walk starts. A pathspec narrows which paths matter during that walk.
These two commands do different things:
git log main srcgit log main -- src
In the first form, src is still in revision-argument position. In the second, main is the history starting point and src is a pathspec that limits the query to that part of the tree.
A Tiny -- Probe
You can make the ambiguity visible with a branch and a directory that share the same name:
git init demo-double-meaning
cd demo-double-meaning
mkdir -p src docs
printf 'one\n' > src/file.txt
git add .
git commit -m 'add src file'
git branch src
printf 'note\n' > docs/note.txt
git add docs/note.txt
git commit -m 'docs only'
git show-ref --heads
git log --oneline main src
git log --oneline main -- src
show-ref proves that src is a branch name in this repository, but there is also a src/ directory. git log main src runs straight into that ambiguity. git log main -- src removes it by saying, plainly, "walk from main, then treat src as a path."
Ranges Are (Simple) Set Algebra
Git history ranges make a little more sense once you stop reading them as punctuation tricks and start reading them as set operations over reachable commits.
Git begins with one or more starting commits, walks backward through parent links, and accumulates the commits reachable from those starting points. Range syntax changes which starting points are included and which reachable commits are excluded.
Range syntax adds another layer. The most common forms are:
A Bmeans "walk from both A and B"^A Bmeans "walk from B, but exclude commits reachable from A"A..Bis shorthand for^A BA...Bingit logmeans the symmetric difference: commits reachable from either side, but not from both
That last form is easy to misread. In git log, A...B is a set operation over history. In git diff, the same spelling is merge-base based and has different semantics. The punctuation is shared, but the command still matters.
If you keep the set model in mind, many Git queries get easier to parse. "Show me what is on this branch but not that one" is reachable-set subtraction. "Show me the commits unique to either side" is symmetric difference. Git is doing set algebra here, just with more carets than anyone would choose on purpose.
rev-parse and rev-list
At the plumbing level, keep in mind the difference between naming and walking.
git rev-parseresolves revision expressionsgit rev-listwalks commitsgit logis largelyrev-listplus formatting and presentation
As a rough implementation sketch: first Git decides what commits you mean. Then it walks the graph. Then, depending on the command, it formats or filters the result.
git log sometimes feels fast and sometimes expensive for exactly this reason. Despite the name, it is doing traversal work rather than simply reading from a log.
A Few Commands Make Revision Syntax Concrete
This gets easier once you ask Git to resolve and walk a few expressions directly:
git rev-parse HEAD HEAD^ HEAD~3
git rev-list --count main..topic
git rev-list --left-right --count main...topic
git merge-base main topic
git log --first-parent main
Those five commands cover a lot of ground: naming one commit, naming relative ancestors, subtracting one reachable set from another, finding a merge base, and reading a mainline view instead of the full graph.
If you want the set operations to look like actual commits instead of counts:
git rev-list --oneline main..topic | sed -n '1,5p'
git rev-list --left-right --oneline main...topic | sed -n '1,5p'
main..topic is one-sided subtraction. main...topic is symmetric difference, and --left-right marks which side each commit came from.
How Git Walks the Graph
Once Git has a set of starting revisions, the basic traversal model is straightforward: read a commit, emit it if it survives the filters, follow its parent links, and keep going until the walk is exhausted.
What varies is how Git orders and prunes that walk.
For example:
- topological order tries to avoid showing children before parents
- date order favors commit timestamps
- first-parent traversal follows only the mainline parent at merges
- exclusion ranges prune parts of the reachable graph before they are printed
One history graph can therefore produce very different views depending on the command options. A release engineer reading git log --first-parent main is asking a different question from someone reading the full branch topology of a topic branch.
Commit-graph files help here too. The commit graph itself already exists in the commit objects, but the commit-graph file stores derived metadata that lets Git answer ancestry questions more efficiently without recomputing the same structure from scratch on every run.
Merge Bases Are Computed, Not Stored
- Merge Base
- Best common ancestor Git computes for two or more commits.
Git does not store a permanent "common ancestor" field for every possible pair of commits. When a command needs a merge base, Git computes it by walking ancestry and finding the best common ancestor or ancestors for the commits in question.
That matters in more places than just merges. Merge bases show up in:
git merge-base- three-dot revision ranges
- many diff and review workflows
- rebase planning
- conflict computation
Git is not reading a canned answer here. It computes merge bases from the commit graph.
Path-Limited History Is Another Layer on Top
Plain history traversal walks commits and parent links. Path-limited history adds another job: deciding whether the path or paths you asked about changed across those commits.
Commands like git log -- path/to/file, git rev-list -- path/to/file, and git blame path/to/file do more than a plain history walk with fewer results. Git may have to inspect trees while it walks the graph in order to determine whether the path's content or location changed between a commit and its parents.
Path history is often more expensive than branch history for the same reason. The graph walk is still there, but now tree comparisons join the query.
A Tiny Path-History Probe
You can watch that extra work happen in a tiny repository:
git init demo-path-history
cd demo-path-history
mkdir -p src docs
printf 'one\n' > src/file.txt
git add .
git commit -m 'add file'
printf 'note\n' > docs/note.txt
git add docs/note.txt
git commit -m 'docs only'
git mv src/file.txt src/renamed.txt
git commit -m 'rename file'
printf 'two\n' >> src/renamed.txt
git commit -am 'edit file'
git rev-list --oneline HEAD
git log --oneline -- src/renamed.txt
git log --oneline --follow -- src/renamed.txt
git rev-list --reverse HEAD | while read commit
do
printf '\n== %s ==\n' "$commit"
git diff-tree --root --find-renames --name-status -r "$commit"
done
rev-list shows the whole reachable commit set. The path-limited log commands ask a narrower question about one file. diff-tree makes the extra job visible: Git still walks the graph, but it also compares tree state at each step, and --follow adds rename inference on top.
The basic shape looks like this:
- choose starting commits
- walk parent links
- for each commit of interest, inspect trees as needed
- keep or discard commits based on whether the path changed
That is already more work than a plain commit walk, and rename-following adds another bill.
Rename Detection Is Heuristic
Git does not store "rename" as a first-class event in commit history. As Chapter 2 showed, trees store names and blobs store content; if a path disappears and a similar path appears, Git may infer that one became the other.
That means path history with rename-following is not simply reading metadata. It is asking Git to compare content similarity across snapshots and infer whether one path should be treated as the continuation of another.
That has a few consequences:
- rename detection is query-time work
- it can be expensive on large histories
- results depend on heuristics and thresholds
--followapplies to one path and has practical limitations around merges and complex renames
Git's data model shows up directly in user experience here. Git gets flexibility because renames are inferred rather than stored, and the bill shows up at read time.
Why Path History Gets Slow
When a path-limited command is slow, several costs may be stacking up at once:
- the reachable commit set may be large
- merge-heavy history may widen the walk
- tree inspection may be needed at many commits
- rename detection may require extra similarity work
- ordering, filtering, or formatting may keep commits in memory longer
Performance advice for path-limited history commands looks different from advice for git status. The first problem is primarily graph and tree work; the second is primarily index and filesystem work.
Changed-path Bloom filters help precisely because they let Git skip some tree inspections during path-limited history queries. They do not replace the commit graph or change the logical history model. They are a fast negative test layered on top of the existing walk.
First-Parent History Is a Different View
One especially useful history mode is --first-parent. It tells Git to follow only the first parent at merge commits, which turns a fully branched history into a cleaner mainline view.
This is useful when the question is not "what is every commit reachable from here?" but something narrower, such as:
- what landed on
main - what release commits were integrated
- what the integration history of a branch looks like
That option does not reveal a truer history. It reveals a different slice of the history graph, tuned for a different kind of reading.
History Is Computed Every Time You Ask for It
The main idea of this chapter is simple: Git history is a computed view over immutable objects and parent links.
When you ask a history question, Git has to:
- resolve revision expressions
- seed include and exclude sets
- walk parent links
- optionally compute merge-base relationships
- optionally inspect trees and blobs
- optionally apply rename heuristics
- format the surviving commits for the command you ran
Once that model is in place, several things get easier to understand at once:
- why some history queries are cheap and others are expensive
- why range syntax matters
- why path-limited history is qualitatively different from plain branch history
- why commit-graph and Bloom filters help without changing Git's meaning