Why System Design Interviews Matter
System design interviews have become a critical part of the hiring process at companies of all sizes, from startups to tech giants like Google, Amazon, and Flipkart. Unlike coding interviews that test algorithmic thinking, system design rounds evaluate your ability to build scalable, reliable, and maintainable systems. These interviews assess how you think about trade-offs, handle ambiguity, and communicate complex technical decisions. Whether you are targeting a senior developer role at a product company or a lead engineer position at a startup, strong system design skills will set you apart from other candidates. This guide covers the core concepts you need and walks through a complete example to sharpen your preparation.
The Framework: How to Approach Any System Design Question
Every system design interview follows a predictable structure. Mastering this framework gives you a repeatable approach regardless of the specific problem.
Step 1: Clarify Requirements (5 minutes). Never jump into designing immediately. Ask questions to understand functional requirements (what the system should do) and non-functional requirements (performance, availability, consistency). For example, if asked to design a URL shortener, clarify: How many URLs per day? What is the expected read-to-write ratio? Do shortened URLs expire? Is analytics required? This demonstrates maturity and prevents wasted effort on wrong assumptions.
Step 2: Estimate Scale (3 minutes). Back-of-the-envelope calculations help you choose appropriate technologies. If the system handles 100 million URLs, that changes your storage and caching decisions compared to 10,000 URLs. Calculate requests per second, storage needs over 5 years, and bandwidth requirements. Interviewers care more about your reasoning than exact numbers.
Step 3: Define the API (3 minutes). Outline the key API endpoints. For a URL shortener: POST /shorten takes a long URL and returns a short one, and GET /{shortCode} redirects to the original URL. This grounds your design in concrete interfaces.
Step 4: Design the High-Level Architecture (10 minutes). Draw the major components: clients, load balancers, application servers, databases, and caches. Explain data flow from request to response. Keep it simple first and add complexity only as needed.
Step 5: Deep Dive into Components (15 minutes). This is where you demonstrate depth. Pick two or three critical components and discuss their design in detail. Explain trade-offs you are making and why.
Designing a URL Shortener: Step by Step
Let us apply the framework to one of the most commonly asked system design questions.
Requirements: The system should generate a unique short URL for any given long URL, redirect users from the short URL to the original, handle 500 million new URLs per month, and support 10 billion redirects per month. URLs should have a configurable expiration with a default of 5 years.
Scale Estimation: Write operations come to about 200 per second (500 million divided by 30 days, 24 hours, and 3600 seconds). Read operations are roughly 20 times the writes, giving us 4,000 reads per second. Each URL record needs about 500 bytes of storage. Over 5 years, that totals approximately 15 TB of data. This scale demands a distributed database and aggressive caching for reads.
Short URL Generation: Use a Base62 encoding (a-z, A-Z, 0-9) with 7 characters, which gives 62 to the power of 7 or roughly 3.5 trillion unique combinations. Two approaches work here. First, you can hash the long URL using MD5 or SHA-256 and take the first 7 characters after Base62 encoding. This is simple but requires collision handling. Second, you can use a counter-based approach with a distributed ID generator like Twitter’s Snowflake algorithm, then Base62 encode the ID. This guarantees uniqueness without collision checks.
Database Design: A single table with columns for id, short_code (indexed), original_url, created_at, and expires_at. For this read-heavy workload, a NoSQL database like DynamoDB or Cassandra handles the scale better than a relational database. The short_code serves as the partition key, ensuring even distribution across nodes.
Load Balancing and Caching Strategies
Load balancers distribute incoming traffic across multiple application servers. For our URL shortener, a Layer 7 (application layer) load balancer works best because it can make routing decisions based on the URL path. Common algorithms include round-robin for evenly distributing requests, least connections for sending traffic to the least busy server, and consistent hashing for routing the same short codes to the same servers, which improves cache hit rates.
Caching is critical for read-heavy systems. Our URL shortener has a 20:1 read-to-write ratio, making it an ideal candidate for aggressive caching. Use an in-memory cache like Redis or Memcached to store the most frequently accessed URL mappings. Apply the 80-20 rule: 20% of URLs generate 80% of traffic. Caching just the top 20% of daily traffic requires about 35 GB of memory, which is feasible for a Redis cluster.
For the cache eviction policy, use Least Recently Used (LRU) eviction. When a read request comes in, first check the cache. On a cache hit, return the original URL immediately. On a cache miss, query the database, store the result in the cache, and then return the URL. For writes, use a write-through strategy where new entries are written to both the database and the cache simultaneously, ensuring consistency.
Database Sharding and Replication
As data grows beyond what a single database server can handle, you need sharding. Two common strategies apply here. Hash-based sharding uses a hash function on the short_code to determine which shard stores the data. This distributes data evenly but makes range queries difficult. Range-based sharding assigns short codes starting with certain characters to specific shards, which is simpler to implement but can lead to uneven distribution if certain character ranges are more popular.
For our URL shortener, hash-based sharding on the short_code is the better choice since we primarily need point lookups, not range queries. Use consistent hashing to minimize data movement when adding or removing shards.
Replication provides fault tolerance and read scaling. Each shard should have at least two replicas in different availability zones. Use leader-follower replication where the leader handles writes and followers handle reads. Since our system is eventually consistent (a slight delay in URL availability after creation is acceptable), asynchronous replication works well and avoids the latency penalty of synchronous replication.
Common System Design Topics to Study
Beyond the URL shortener, prepare for these frequently asked design problems. Design a chat system like WhatsApp, focusing on WebSocket connections, message delivery guarantees, and presence indicators. Design a news feed like Twitter, covering fan-out strategies (push vs pull), timeline generation, and ranking algorithms. Design a file storage system like Google Drive, addressing chunked uploads, deduplication, and sync conflict resolution. Design a rate limiter, covering token bucket and sliding window algorithms.
For each problem, practice applying the same five-step framework. Study the building blocks that appear across multiple designs: consistent hashing, message queues (Kafka, RabbitMQ), CDNs, reverse proxies, SQL vs NoSQL trade-offs, CAP theorem, and eventual consistency patterns.
Resources for practice include “Designing Data-Intensive Applications” by Martin Kleppmann, the System Design Primer on GitHub, and mock interview platforms like Pramp. Dedicate at least 4 to 6 weeks of focused study, designing one system per day on paper or a whiteboard. Explain your designs out loud to build the communication skills that interviewers evaluate alongside your technical knowledge.