Every senior engineer I respected at Google, Meta and Amazon had the same slightly unnerving habit: they could open a 200,000-line codebase they had never seen before and, within an afternoon, tell you three things that were wrong with it. They weren't guessing. They were reading. And they were reading in a way the rest of us, mostly, were not.
Reading code well is the single most undertaught skill in the industry. Every bootcamp teaches you how to write it. Almost none teach you how to read it. Here is how I've tried to get better, drawn from a decade of being around people who were better at it than I was.
Start with the README and the tests, not with main
The most common mistake is to open the entry point and start tracing function calls. You will get lost in thirty minutes. Instead:
- Read the README once. Skim.
- Open the test directory and read three or four tests. Tests are executable documentation of the intended use.
- Now open the entry point.
The tests anchor you. Without them, you are doing the equivalent of reading a novel by picking a random page and trying to infer the plot.
Hunt for the shape, not the details
On a first pass your goal is not to understand what every line does. Your goal is to answer two questions: where does the data come in, and where does it go out. That's it. Input, output. Everything between is, for now, a black box.
Once you have the in-and-out clear, you ask: what are the three or four transformations that happen along the way, at the level of abstraction of a whiteboard? You can almost always find them by looking at the directory structure and the names of the top-level modules. Codebases that have been designed deliberately will tell you their shape just by the way the folders are laid out. Codebases that haven't been are a useful red flag in themselves.
Use the debugger as a reading tool
This one changed my reading speed by a factor of about three. Instead of trying to trace control flow by eye, set a breakpoint somewhere interesting and run the test suite. Step through a single request. Watch what actually happens.
You will find, almost immediately, that the code does not do what you thought it did. That gap — between the model in your head and the model the computer is actually executing — is where every interesting bug lives.
Read the git log backwards
When you come to the tricky bit — the file that doesn't make sense, the function that
seems to do three things at once — run git log -p on it and read the
history backwards. You will almost always find that the current shape is the
accumulation of three or four historical decisions, each of which made sense at the
time. Understanding a file as a sequence of deltas is much easier than understanding it
as a static blob.
Ask the dumb question on purpose
When you find yourself not understanding a block of code, resist the temptation to stare
at it for another ten minutes. Instead write down the specific question: "why does this
function check user.locale before dispatching the webhook?". Then find the
person most likely to know — it is almost always the person at the top of
git blame — and ask them.
The dumb question is how senior engineers read code fast. They ask far more questions than junior engineers, not fewer. The confidence to look uninformed is, paradoxically, the thing that makes them look informed.
Make a small change on purpose
Once you have read a section, commit to a tiny, reversible change in it. Rename a variable. Extract a helper. Add a comment. The act of modifying the code forces you to confirm your understanding in a way that reading alone will not. You will often discover, in the act of changing a single line, that your mental model was wrong.
Read code you think is good
The most neglected practice. We spend enormous time reading the code we maintain, which has every reason to be average. We almost never sit down and read code that was deliberately written well.
A short list of things I've learned a lot from reading, in the hope that at least one
will be new to you: redis/redis, the SQLite source, tsoding's
streams, the early commits of django/django, caddyserver/caddy,
and the whole of the Go standard library's net/http. Spend a month reading
one of those properly and your sense of what good code looks like will shift.
Most engineers write code for years and spend ten thousand hours on output and maybe a hundred hours on input. The ratio is wrong. Get the input hours up.
— Nivaan