System Design Interviews: The Framework That Got Me to L6

Most system design interview advice on the internet is written by people who have given about ten interviews. I have given somewhere close to three hundred, at two of the three FAANG companies I worked for, and I've sat on the other side of the table for every Staff-and-above hire in my team for the last four years. The framework below is what I actually wanted candidates to do. It is also what got me through the L6 loop at Google in 2019 after two earlier failed attempts.

The short version: there is no clever trick. What you are being judged on is not whether you know a particular bit of architecture. It is whether you can take a huge, underspecified problem and reduce it to something a team of five could actually build. Almost everything I say below is in service of that.

Step 1 — Scope the problem out loud (5 minutes)

The instinct of an anxious candidate is to leap into boxes-and-arrows. Don't. Open with a paragraph about what you think the product actually is, and who uses it, and what they would be upset about if it stopped working. "Upset" is the operative word. An interviewer who asks you to design Twitter wants to know that you have thought about celebrity fan-out. They do not want you to start drawing load balancers.

Ask two or three questions. Not ten. You are not a product manager. The questions I rate highest are: "who is the primary user here, and what is the most common thing they do?" and "what order-of-magnitude are we designing for — is this a hundred thousand users or a hundred million?". Anchor the whole conversation on those answers.

Step 2 — Write the functional requirements as a bulleted list

Physically write them down on the whiteboard. There should be four to seven. Any more than that and you will not get through the design in the time you have. I like to mark two or three as "out of scope" and explicitly say so; it shows you can trim.

Step 3 — State the non-functional requirements with numbers

This is the single biggest differentiator between a mid-level and a senior candidate. A mid-level candidate says "it should be highly available". A senior candidate says "99.95% availability, which gives us about four and a half hours of allowed downtime per year; p99 write latency under 200 ms; eventual consistency is fine for the feed but write-after-read consistency is required for a user's own posts."

You will not always be right. It does not matter. What matters is that you are reasoning in numbers. You are demonstrating that you understand availability is a budget, not a compliment.

Step 4 — Back-of-the-envelope sizing

Take about three minutes. Traffic, data volume, storage growth per year. The point is not accuracy; the point is to make the decisions later in the interview defensible. If you've said the system handles 10,000 writes per second of roughly 1 KB each, you've earned the right to pick a particular storage layer on the grounds that it can handle 10 MB/s of ingest comfortably.

A trick I use: always convert everything into the same unit within a minute of saying it. 100 million daily active users at 10 requests each is 10⁹ requests per day, which is roughly 11,600 QPS on average and perhaps four times that at peak. Saying "roughly 50k QPS peak" before drawing a single box anchors everything that comes next.

Step 5 — Draw the happy path, and only the happy path

Client. Edge. API gateway. One or two services. One database. A cache if you need one. That is it. Draw it. Walk your interviewer through a single request end to end, narrating what happens at each step, including what can go wrong.

Resist the urge to draw seventeen microservices. You are not being graded on the number of boxes. I have failed candidates who drew beautiful diagrams with a queue between every pair of services and could not explain why any particular queue was there.

Step 6 — Deep dive on the interesting bit

Every system has one area where the interesting problem lives. In a URL shortener it is ID generation. In a news feed it is fan-out. In a ride-hailing system it is matching. In a payments system it is idempotency and reconciliation. Identify that area out loud — "the most interesting piece here is the fan-out problem, because a celebrity with ten million followers turns one write into ten million" — and then spend most of your remaining time on it.

This is where your technical credibility is actually demonstrated. Everything before this step is structure. This step is depth.

Step 7 — Talk about what breaks

With the last five minutes, volunteer three specific failure modes: what happens when a region goes down, what happens when a hot partition develops, what happens when a deploy rolls out a bug to a downstream service. Propose a mitigation for each.

Interviewers at Staff level almost always want to see this. It's the cheapest way to separate a candidate who has run production systems from one who has read about them.

Two things that will kill you even if the framework is right

Silence. Think out loud. An interviewer cannot give you credit for reasoning they cannot hear.
Inflexibility. If an interviewer pushes back on a decision, don't dig in. Say "fair, if that's the constraint then I'd change X to Y" and carry on. Big-tech design interviews are as much about collaborative technical conversation as they are about the answer.

The third time I did the L6 loop at Google I used exactly the seven steps above. The first two times I'd been jumping straight to boxes-and-arrows within a minute. The difference was not that I'd become a better engineer in nine months. The difference was that I'd learned to narrate.

— Nivaan