Chapter 4: Revisions and History Traversal

Pencil sketch of people fishing from a long wooden pier with waterfront houses and boats in the distance.

Git history is a graph Git walks every time you ask a question. Commits are objects with parent pointers; the history view you see is computed from those walks on demand. git log, git show, git blame, git bisect, and git merge-base all start from the same basic move: pick commits, walk parent links, and keep the results that match the question you asked.

This chapter takes that in three layers:

how Git names commits and ranges
how Git walks the graph
how Git computes path-limited history on top of that walk in commands like git log -- path, git rev-list -- path, and git blame

Revisions Are Expressions, Not Just Commit IDs

In everyday use, a "revision" often means "some commit I want to refer to," but Git's revision syntax is (for better or worse) broader than that. A revision expression can name a ref, a commit ID, a relative ancestor, a set difference, or a merge-related relationship that Git has to compute. You've seen those tildes and carets in revisions (and likely forgot what they mean); well that's this.

Revision: Expression that Git resolves to one or more commits or objects.

If revision syntax feels dense at first, that is because Git is giving you a compact language for describing commit sets.

HEAD means the current commit
main means the commit named by the main ref
a1b2c3d means the commit with that object ID prefix
HEAD^ means the first parent of HEAD
HEAD~3 means the first-parent ancestor three steps back
merge^2 means the second parent of a merge commit

The important distinction is that ^ picks a specific parent edge, while ~ stays on the first-parent line and keeps walking. So HEAD~3 and HEAD^^^ often land on the same commit, while merge~1 and merge^2 can point in completely different directions.

Revisions are expressions that Git resolves to objects, with raw IDs being only one possible form. One more symbol is worth covering before the range syntax.

The Separator To Reduce Ambiguity

A small but easy-to-miss bit of revision syntax: --.

Pathspec: Git path-matching expression used to limit a command to particular files, directories, or patterns.

In commands such as git log, git show, or git diff, Git has to distinguish revision arguments from path arguments, and the -- marker tells Git that everything after it should be treated as a pathspec rather than as another revision. A pathspec is Git's language for saying which parts of the tree you mean: a literal path such as src/main.c, a directory such as docs/, or a pattern. The distinction matters because a revision changes where the history walk starts, while a pathspec narrows which paths matter during that walk.

These two commands do different things:

git log main src
git log main -- src

In the first form, src is still in revision-argument position. In the second, main is the history starting point and src is a pathspec that limits the query to that part of the tree.

A Tiny `--` Probe

You see this with a branch and a directory that share the same name:

git init demo-double-meaning
cd demo-double-meaning

mkdir -p src docs
printf 'one\n' > src/file.txt
git add .
git commit -m 'add src file'

git branch src

printf 'note\n' > docs/note.txt
git add docs/note.txt
git commit -m 'docs only'

git show-ref --heads
git log --oneline main src
git log --oneline main -- src

show-ref proves that src is a branch name in this repository, but there is also a src/ directory. git log main src runs straight into that ambiguity. git log main -- src removes it by saying, plainly, "walk from main, then treat src as a path."

Ranges Are (Simple) Set Algebra

Git history ranges make a little more sense once you stop reading them as punctuation tricks and start reading them as set operations over reachable commits.

Reachability: Whether an object — a commit, tree, or blob — can be reached by following parent links and content references starting from a given commit or set of tips.

Git begins with one or more starting commits, walks backward through parent links, and accumulates the commits reachable from those starting points. Range syntax changes which starting points are included and which reachable commits are excluded.

A B means "walk from both A and B"
^A B means "walk from B, but exclude commits reachable from A"
A..B is shorthand for ^A B
A...B in git log means the symmetric difference: commits reachable from either side, but not from both

That last form is easy to misread. In git log, A...B is a set operation over history. In git diff, the same spelling is merge-base based and has different semantics. The punctuation is shared, but the command still matters.

If you keep the set model in mind, many Git queries get easier to parse. "Show me what is on this branch but not that one" is reachable-set subtraction. "Show me the commits unique to either side" is symmetric difference. Git is doing set algebra here, just with more carets than anyone would choose on purpose.

`rev-parse` and `rev-list`

At the plumbing level, I keep naming and walking as two separate ideas: git rev-parse resolves revision expressions, git rev-list walks commits, and git log is largely rev-list plus formatting and presentation. Roughly, Git first decides what commits you mean, then walks the graph, then formats or filters the result depending on the command.

git log sometimes feels fast and sometimes slow for exactly this reason. Despite the name, it is doing graph traversal, not reading a log file.

A Few Commands Make Revision Syntax Concrete

Resolve and walk a few expressions directly and it gets easier:

git rev-parse HEAD HEAD^ HEAD~3
git rev-list --count main..topic
git rev-list --left-right --count main...topic
git merge-base main topic
git log --first-parent main

Those five commands cover a lot of ground: naming one commit, naming relative ancestors, subtracting one reachable set from another, finding a merge base, and reading a mainline view instead of the full graph.

If you want the set operations to look like actual commits instead of counts:

git rev-list --oneline main..topic | sed -n '1,5p'
git rev-list --left-right --oneline main...topic | sed -n '1,5p'

main..topic is one-sided subtraction. main...topic is symmetric difference, and --left-right marks which side each commit came from.

How Git Walks the Graph

Once Git has a set of starting revisions, the basic traversal model is straightforward: read a commit, emit it if it survives the filters, follow its parent links, and keep going until the walk is exhausted.

What varies is how Git orders and prunes that walk.

For example:

topological order tries to avoid showing children before parents
date order favors commit timestamps
first-parent traversal follows only the mainline parent at merges
exclusion ranges prune parts of the reachable graph before they are printed

One history graph can therefore produce very different views depending on the command options. A release engineer reading git log --first-parent main is asking a different question from someone reading the full branch topology of a topic branch.

Commit-graph files help here too. The commit graph itself already exists in the commit objects, but the commit-graph file stores derived metadata that lets Git answer ancestry questions more efficiently without recomputing the same structure from scratch on every run.

Merge Bases and Path-Limited History

Merge Base: Best common ancestor Git computes for two or more commits.

Git does not store a permanent "common ancestor" field for every possible pair of commits. When a command needs a merge base, Git computes it by walking ancestry and finding the best common ancestor or ancestors for the commits in question.

That matters in more places than just merges. In these cases, Git is computing merge bases from the commit graph instead of reading a canned history:

git merge-base
three-dot revision ranges
many diff and review workflows
rebase planning
conflict computation

Plain history traversal walks commits and parent links. Path-limited history adds another job: deciding whether the path or paths you asked about changed across those commits.

Commands like git log -- path/to/file, git rev-list -- path/to/file, and git blame path/to/file do more than a plain history walk with fewer results. Git may have to inspect trees while it walks the graph in order to determine whether the path's content or location changed between a commit and its parents.

Path history is often more expensive than branch history for the same reason. The graph walk is still there, but now tree comparisons join the query.

A Tiny Path-History Probe

You can watch that extra work happen in a tiny repository:

git init demo-path-history
cd demo-path-history

mkdir -p src docs
printf 'one\n' > src/file.txt
git add .
git commit -m 'add file'

printf 'note\n' > docs/note.txt
git add docs/note.txt
git commit -m 'docs only'

git mv src/file.txt src/renamed.txt
git commit -m 'rename file'

printf 'two\n' >> src/renamed.txt
git commit -am 'edit file'

git rev-list --oneline HEAD
git log --oneline -- src/renamed.txt
git log --oneline --follow -- src/renamed.txt

git rev-list --reverse HEAD | while read commit
do
  printf '\n== %s ==\n' "$commit"
  git diff-tree --root --find-renames --name-status -r "$commit"
done

rev-list shows the whole reachable commit set. The path-limited log commands ask a narrower question about one file. diff-tree makes the extra job clear: Git still walks the graph, but it also compares tree state at each step, and --follow adds rename inference on top.

The basic shape looks like this:

choose starting commits
walk parent links
for each commit of interest, inspect trees as needed
keep or discard commits based on whether the path changed

That is already more work than a plain commit walk, and rename-following adds another layer of cost.

Rename Detection Is Heuristic

Git does not store "rename" as a first-class event in commit history. As Chapter 2 showed, trees store names and blobs store content; if a path disappears and a similar path appears, Git may infer that one became the other.

That means path history with rename-following asks Git to compare content similarity across snapshots and infer whether one path should be treated as the continuation of another. It is real query-time work, well beyond reading metadata.

That has a few consequences:

rename detection is query-time work
it can be expensive on large histories
results depend on heuristics and thresholds
--follow applies to one path and has practical limitations around merges and complex renames

Git's data model shows up directly in user experience here. Git gets flexibility because renames are inferred rather than stored, and the cost shows up at read time.

Why Path History Gets Slow

When a path-limited command is slow, several costs may be stacking up at once:

the reachable commit set may be large
merge-heavy history may widen the walk
tree inspection may be needed at many commits
rename detection may require extra similarity work
ordering, filtering, or formatting may keep commits in memory longer

Performance advice for path-limited history commands looks different from advice for git status. The first problem is primarily graph and tree work; the second is primarily index and filesystem work.

Changed-path Bloom filters help precisely because they let Git skip some tree inspections during path-limited history queries. They sit on top of the existing walk as a fast negative test, leaving the commit graph and the logical history model intact.

First-Parent History Is a Different View

One especially useful history mode is --first-parent. It tells Git to follow only the first parent at merge commits, which turns a fully branched history into a cleaner mainline view.

This is useful when the question is not "what is every commit reachable from here?" but something narrower, such as:

what landed on main
what release commits were integrated
what the integration history of a branch looks like

That option reveals a different slice of the history graph, tuned for a different kind of reading. It is one valid view among several, each shaped to the question being asked.

History Is Computed Every Time You Ask for It

The main idea of this chapter is simple: Git history is a computed view over immutable objects and parent links.

When you ask a history question, Git has to:

resolve revision expressions
seed include and exclude sets
walk parent links
optionally compute merge-base relationships
optionally inspect trees and blobs
optionally apply rename heuristics
format the surviving commits for the command you ran

Once that model is in place, several things get easier to understand at once:

why some history queries are cheap and others are expensive
why range syntax matters
why path-limited history is qualitatively different from plain branch history
why commit-graph and Bloom filters help without changing Git's meaning