System Design Interview Preparation

65% of Senior Candidates Fail System Design Rounds. Here’s the Gap Nobody Talks About.

Sixty-five percent. That’s the rejection rate for system design interviews at mid-to-large product companies, according to internal hiring data from three Bengaluru-based startups I spoke with in late 2025. Not coding rounds — system design specifically. Candidates who breeze through DSA problems with perfect time complexity? They walk into a 45-minute whiteboard session about designing a URL shortener and just… freeze.

I’ve been on both sides of that table. Fumbled my first system design interview at a Series B startup in 2021 — drew some boxes, said “load balancer” a few times, couldn’t explain why I’d pick Cassandra over Postgres. Got a polite rejection email three days later. Sat on the interviewer’s side for the first time in early 2023. Watched a candidate with six years of experience struggle to estimate how much storage a chat app would need.

Here’s what I’ve figured out since: the gap isn’t knowledge. Most senior devs know what a load balancer does. They know caching exists. They’ve probably even set up database replication at some point. The gap is structure. Knowing how to walk through a problem in 45 minutes, making trade-off decisions out loud, and convincing someone you could actually build the thing — that’s the skill nobody teaches you on the job.

So let’s fix that. I’m going to walk you through system design interviews the way they actually happen — as questions. Because that’s the format, right? Someone asks you a question, you work through it. We’ll cover the framework first, then do a full deep dive on a URL shortener (still the most common question in 2026, somehow), and then hit the building blocks that show up in every single design problem.

“How Would You Approach This Problem?” — The Framework Question

Every interviewer starts here, even if they don’t say it out loud. What they’re really asking: do you have a repeatable process, or are you going to wing it?

I’ve watched maybe 40 candidates over the past two years. The ones who pass almost always follow some version of the same five steps. Not because there’s one correct framework — plenty of valid approaches exist — but because having any structure beats having none.

Step 1: Clarify Requirements (roughly 5 minutes)

Don’t start drawing boxes. Please. The number of candidates who jump straight into “so we’ll have a load balancer here…” is genuinely painful to watch. Spend the first few minutes asking questions.

Functional requirements: what should the system actually do? Non-functional requirements: how fast, how available, how consistent? For a URL shortener, you might ask: How many URLs per day? What’s the expected read-to-write ratio? Do shortened URLs expire? Do we need analytics?

Asking these questions isn’t just about getting information. It signals maturity. It tells the interviewer you’ve built real systems before, where getting the requirements wrong means rebuilding everything three sprints later.

Step 2: Estimate Scale (about 3 minutes)

Back-of-the-envelope math. Nobody expects you to nail the exact number. What matters is the reasoning. A system handling 100 million URLs needs a fundamentally different architecture than one handling 10,000. Your storage decisions change. Your caching strategy changes. Everything changes.

Quick trick I picked up: start with the monthly number, divide by 30 for daily, then by 86,400 for per-second. Round aggressively. Interviewers care about orders of magnitude, not decimal points.

Step 3: Define the API (about 3 minutes)

Sketch out the key endpoints. For a URL shortener:

POST /shorten — takes a long URL, returns a short one
GET /{shortCode} — redirects to the original URL

Why bother with this step? Because it grounds your design in something concrete. Instead of talking about abstract “data flows,” you’re now designing around specific operations. Way easier to reason about.

Step 4: High-Level Architecture (roughly 10 minutes)

Now you draw boxes. Clients, load balancers, application servers, databases, caches. Show how a request moves through the system from start to finish. Keep it simple at first — you’ll add complexity in the next step.

Step 5: Deep Dive (about 15 minutes)

Pick two or three components and go deep. Database choice and why. Caching strategy and why. Sharding approach and why. Notice the pattern? Every decision needs a “why.” That’s really what the interviewer is evaluating — your reasoning, not your memorized architecture diagrams.

“Can You Design a URL Shortener?” — Still the Most Common Question in 2026

Yep, still. I asked five friends who interviewed at Indian product companies between October 2025 and February 2026. Three of them got asked this exact question. One got asked to design a “link management platform” which is, you know, the same thing with a fancier name.

Let’s work through it properly.

Requirements (what we’d clarify with the interviewer)

Generate a unique short URL for any given long URL. Redirect users from the short URL to the original. Handle 500 million new URLs per month. Support 10 billion redirects per month. URLs have a configurable expiration, defaulting to 5 years.

Scale Estimation (the back-of-envelope part)

Write operations: 500 million per month. Divide by 30 days, then by 86,400 seconds. That gives us roughly 200 writes per second. Not huge.

Read operations: 10 billion redirects per month means about 4,000 reads per second. That’s a 20:1 read-to-write ratio. Immediately tells you caching will be critical.

Storage: each URL record needs maybe 500 bytes (short code, original URL, timestamps, metadata). Over 5 years at 500 million new URLs per month? That’s 30 billion records. At 500 bytes each, roughly 15 TB of data. A single database server isn’t going to cut it.

“How Would You Generate the Short URLs?” — The Encoding Question

Interviewers love this one because it has real trade-offs. Two main approaches work, and picking between them reveals how you think.

Approach 1: Hash-based. Take the long URL, run it through MD5 or SHA-256, Base62 encode the result, and grab the first 7 characters. Simple. But you’ve got a collision problem — different long URLs could produce the same short code. You’d need a check-and-retry loop, which adds latency and complexity.

Approach 2: Counter-based. Use a distributed ID generator (something like Twitter’s Snowflake algorithm, which they open-sourced back in 2010 and people are still using in 2026 — that’s a good sign). Generate a unique numeric ID, Base62 encode it. No collisions possible because each ID is unique by definition.

Why Base62? Because it uses a-z, A-Z, and 0-9 — characters that are safe in URLs without encoding. With 7 characters, you get 62^7 combinations. That’s roughly 3.5 trillion unique short codes. For 30 billion URLs over 5 years, you’ve got headroom for centuries.

My preference? Counter-based. Collision handling is annoying to test, annoying to debug at 3 AM when your collision rate spikes because of some edge case you didn’t think of. I’d rather pay the operational cost of running a distributed ID generator than deal with probabilistic uniqueness guarantees.

“What Database Would You Use?” — The Storage Question

Probably the question where candidates stumble most, honestly. Not because they don’t know databases — because they don’t articulate why they’d pick one over another.

Our URL shortener needs a single table, more or less:

id          BIGINT
short_code  VARCHAR(7)   -- indexed
original_url TEXT
created_at  TIMESTAMP
expires_at  TIMESTAMP

The access pattern is almost entirely point lookups: given a short code, find the original URL. No complex joins. No aggregation queries. No full-text search. Just key-value lookups at massive scale.

That screams NoSQL. DynamoDB or Cassandra would both handle this beautifully. Cassandra in particular — it’s designed for write-heavy workloads (which we have, at 200 writes/second) and scales horizontally by adding nodes. DynamoDB is probably easier to operate if you’re on AWS, though you give up some control over partitioning.

Could you use Postgres? Sure, with proper sharding. But you’d be fighting against the grain. Relational databases shine when you need relationships, transactions across tables, complex queries. We need none of that here. Pick the tool that fits the job, right?

Use the short_code as your partition key. It distributes evenly across nodes (especially with the counter-based approach, where sequential IDs get Base62-encoded into seemingly random strings), and it’s the field you’ll query by 99.9% of the time.

“How Do You Handle 4,000 Reads Per Second?” — The Caching Question

Four thousand reads per second isn’t actually that many for a well-configured database cluster. But here’s the thing — those reads will have a massively skewed distribution. Some short URLs go viral. A few hundred links might account for 80% of your traffic on any given day. And hitting the database for the same URL mapping ten thousand times per second when the answer never changes? That’s just wasteful.

Put Redis or Memcached in front of your database. In-memory key-value stores that respond in sub-millisecond time. For our URL shortener, the cache key is the short code and the value is the original URL.

How much memory do we need? Apply the 80-20 rule: 20% of URLs generate 80% of traffic. Daily read requests: about 345 million (10 billion per month divided by 30). If we cache the top 20% of those URL mappings, that’s roughly 69 million entries. At 500 bytes each, that’s around 35 GB. Totally feasible for a Redis cluster — I’ve seen production Redis instances at Indian startups running 64 GB or more without breaking a sweat.

Cache Strategy Details

Read path: Request comes in. Check Redis first. Cache hit? Return the original URL instantly. Cache miss? Query the database, store the result in Redis with a TTL, then return the URL. Simple and effective.

Write path: Use write-through caching. When a new short URL gets created, write to both the database and Redis simultaneously. Guarantees that any subsequent read will find the data in cache immediately.

Eviction policy: LRU (Least Recently Used). When the cache fills up, kick out the entries nobody’s accessed recently. Works beautifully for our use case because popular URLs stay cached and obscure ones get evicted naturally.

Interview tip: Mentioning specific cache sizes and calculations impresses interviewers. It shows you’ve actually operated systems at scale, not just read about them in a textbook.

“What Happens When One Server Isn’t Enough?” — The Load Balancing Question

With 4,000+ reads per second hitting our application layer, a single server is a bad idea. One crash and the entire service goes down. Plus, individual servers have throughput limits.

Stick a Layer 7 (application layer) load balancer in front of your application servers. Layer 7 specifically because it can inspect the URL path and make smart routing decisions. Layer 4 (transport layer) just looks at IP addresses and ports — less useful here.

Three algorithms you should know cold:

Round-robin: requests go to servers in sequence. Server 1, server 2, server 3, back to server 1. Dead simple. Works fine when all servers have identical capacity.

Least connections: send the next request to whichever server currently has the fewest active connections. Better when request processing times vary — like if some redirects require database lookups (cache misses) while others return instantly from cache.

Consistent hashing: route requests for the same short code to the same server. Why would you want this? Because each application server can maintain its own local cache. If short code “abc1234” always goes to server 5, that server builds up a hot local cache for frequently accessed URLs. Fewer Redis round-trips, lower latency.

For our URL shortener, I’d probably go with consistent hashing. The read-heavy, highly cacheable nature of the workload makes local caching extremely valuable. But I’d mention the trade-off to the interviewer: consistent hashing makes server additions and removals more disruptive than round-robin. You might see a temporary spike in cache misses when scaling up.

“How Do You Scale the Database to 15 TB?” — The Sharding Question

Alright, so we’ve got 15 TB of data projected over 5 years. A single database node isn’t realistic. We need to split the data across multiple servers — that’s database sharding.

Two strategies come up in basically every interview:

Hash-based sharding: run the short_code through a hash function, modulo the number of shards. Short code “abc1234” might hash to shard 3, while “xyz7890” goes to shard 7. Data distributes evenly. But range queries become painful — if you ever need to find “all URLs created between January and March,” you’d have to query every shard.

Range-based sharding: short codes starting with a-f go to shard 1, g-m to shard 2, and so on. Simpler to understand and implement. But if certain character ranges end up more popular (not really an issue with Base62-encoded counter IDs, but could happen with hash-based short codes), you get hot shards.

For our URL shortener, hash-based sharding wins. Our access pattern is purely point lookups — we never need range queries. We always know the exact short code we’re looking for. Even distribution matters more than query flexibility here.

Use consistent hashing for the sharding function (yes, consistent hashing appears again — it’s genuinely useful across multiple layers). When you add a new shard, only a fraction of keys need to migrate, instead of reshuffling everything. I saw a team at a Pune-based fintech add two Cassandra nodes to their cluster in mid-2025 — with consistent hashing, data rebalancing took 20 minutes. Without it? They estimated 4+ hours of downtime.

Replication for Fault Tolerance

Each shard should have at least two replicas, ideally in different availability zones. If your primary shard in Mumbai’s ap-south-1a goes down, a replica in ap-south-1b picks up the traffic.

Use leader-follower replication: the leader handles all writes, followers handle reads. Since our system can tolerate eventual consistency — if a newly created URL takes 200 milliseconds to become available for redirects, nobody’s going to notice — asynchronous replication works fine. Synchronous replication would guarantee immediate consistency but adds latency to every write. Not worth the trade-off here.

“What Other Systems Should I Be Ready to Design?” — The Prep Question

Beyond URL shorteners, here are the questions I’ve seen come up most frequently in Indian tech interviews during 2025-2026:

“Design a chat system like WhatsApp.” Focus areas: WebSocket connections for real-time messaging, message delivery guarantees (sent, delivered, read receipts), presence indicators (online/offline status), and message storage. The tricky part? Managing millions of concurrent WebSocket connections. You’ll likely need a connection broker layer.

“Design a news feed like Twitter.” The big debate here is push vs. pull. Push (fan-out on write): when someone tweets, immediately write that tweet to every follower’s timeline cache. Fast reads, expensive writes. Pull (fan-out on read): when someone opens their feed, pull recent tweets from everyone they follow and merge them. Cheap writes, expensive reads. Most production systems use a hybrid — push for regular users, pull for celebrities with millions of followers.

“Design a file storage system like Google Drive.” Chunked uploads (break files into 4-8 MB chunks so large uploads can resume after network failures), deduplication (don’t store the same file twice — hash the content and check), and sync conflict resolution (what happens when two people edit the same file offline?). Operational Transform or CRDTs usually come up in the deep dive.

“Design a rate limiter.” Token bucket and sliding window are the two algorithms you need. Token bucket: each user gets a bucket that fills with tokens at a fixed rate; each request costs one token; empty bucket means rate limited. Sliding window: count requests in a rolling time window; exceed the threshold, get blocked. Token bucket handles burst traffic more gracefully.

Building Blocks That Show Up Everywhere

Regardless of which specific system you’re designing, these concepts appear over and over. I’d probably spend 60% of your prep time mastering these rather than memorizing specific system designs:

Consistent hashing — for distributing data across servers with minimal redistribution when nodes join or leave. Shows up in sharding, load balancing, caching.

Message queues (Kafka, RabbitMQ) — for decoupling services and handling async processing. Kafka specifically when you need high-throughput ordered event streams. RabbitMQ when you need flexible routing and acknowledgment patterns.

CDNs — for serving static content from edge locations close to users. In India specifically, CDN choice matters a lot because of the geographic spread — a user in Guwahati and a user in Kochi have very different latencies to a Mumbai origin server.

SQL vs. NoSQL trade-offs — SQL when you need ACID transactions, complex joins, strong consistency. NoSQL when you need horizontal scaling, flexible schemas, and your access patterns are simple (key-value lookups, time-series writes). Most real systems use both — it’s not an either/or decision.

CAP theorem — in a distributed system, you can’t simultaneously guarantee Consistency, Availability, and Partition tolerance. Since network partitions are inevitable, the real choice is between consistency and availability during a partition. Our URL shortener chose availability (eventual consistency with async replication). A banking system would choose consistency.

Microservices vs. monoliths — and honestly, for interview purposes, know when each makes sense rather than defaulting to microservices because it sounds impressive. A URL shortener? Probably doesn’t need microservices. A complex e-commerce platform with separate teams handling payments, inventory, recommendations, and shipping? Microservices start making sense. The boundary should be organizational, not technical.

Resources That Actually Helped Me (and Some That Didn’t)

“Designing Data-Intensive Applications” by Martin Kleppmann remains the single best book on this topic, even in 2026. Not a quick read — took me about three weeks going chapter by chapter in early 2024 — but it fundamentally changed how I think about distributed systems. Every concept in this post has a deeper treatment in that book.

The System Design Primer on GitHub (donnemartin/system-design-primer) is a solid free alternative. Not as deep as Kleppmann’s book, but covers more breadth. Good for filling gaps in your knowledge.

Mock interviews on Pramp helped me more than I expected. Reading about system design and doing it live are genuinely different skills. The time pressure, the need to think out loud, the follow-up questions — you can’t simulate that by reading.

What didn’t help much? Watching YouTube videos at 2x speed the night before. I tried that before my second system design interview. Retained maybe 10% of it. You need active practice — grab a whiteboard (or honestly, a blank sheet of paper and a pen work fine), pick a system, set a 45-minute timer, and design it start to finish. Explain it out loud, even if nobody’s listening. One system per day for 4-6 weeks is the sweet spot I’d recommend.

My Honest Take: What Actually Matters Most

After sitting through dozens of system design interviews from both sides, here’s what I genuinely believe matters — and it might not be what you’d expect.

The number one thing? Communication. Not knowledge. Not cleverness. Communication. I’ve seen candidates with deep expertise in distributed systems fail because they couldn’t explain their reasoning clearly. And I’ve seen candidates with moderate knowledge pass because they walked through their thought process out loud, acknowledged trade-offs honestly, and said “I’m not sure about this, but here’s my reasoning” instead of bullshitting.

Second: structured thinking beats encyclopedic knowledge. You don’t need to memorize every caching algorithm or know the exact replication protocol Cassandra uses. You need to approach a problem methodically, make reasonable assumptions, and justify your choices. Interviewers aren’t checking answers against a rubric — they’re evaluating whether they’d want to work with you on a real design problem.

Third: trade-offs are the whole point. If you’re presenting a design with no downsides, the interviewer knows you’re either faking it or haven’t thought deeply enough. Every architectural choice has costs. Caching adds complexity and staleness risks. Sharding makes certain queries harder. Microservices add operational overhead. Saying “the downside of this approach is X, and here’s why I think it’s acceptable for our use case” — that’s what a senior engineer sounds like.

And finally — and I know this sounds obvious — relax during the interview. It’s a conversation, not an exam. I’ve had my best interviews (both as candidate and interviewer) when it felt like two engineers whiteboarding a problem together over chai. The worst ones felt like oral exams. You can’t entirely control the interviewer’s style, but you can set a collaborative tone by asking questions, inviting feedback (“does this direction make sense?”), and treating the interviewer as a co-designer rather than a judge.

System design interviews are learnable. That 65% failure rate I mentioned at the top? It’s not because the material is impossibly hard. It’s because most people don’t practice the format. They know the concepts but haven’t practiced structuring a 45-minute design conversation. Fix that gap, and you’re ahead of most candidates walking into the room.

Good luck out there. And if you’re interviewing at Indian tech companies specifically — the chai reference always lands. Trust me on that one.