🏗️ Architecture
System Design Cheatsheet
Scalability, caching, databases, load balancing, CAP theorem and microservices.
📖 10 sections
⏱ 28 min read
✅ Quizzes included
🌙 Dark mode
01 Fundamentals
System Design
Planning architecture of large-scale systems. Focus: scalability, reliability, availability, performance.
Functional reqs
What the system should DO: features, APIs, user stories.
Non-functional reqs
How the system should PERFORM: latency, throughput, availability, consistency.
Throughput
Requests per second (RPS) a system can handle.
Latency
Time for a single request to complete. p50, p95, p99 percentiles.
Availability
Uptime percentage. 99.9% = 8.7 hrs downtime/year. 99.99% = 52 min.
SLA
Service Level Agreement. Contractual availability promise.
Back-of-envelope
Estimate scale before designing: 1B users, avg 100 requests/day = ~1M RPS
99.9% uptime
8.7 hours downtime per year
'Three nines'
99.99% uptime
52 minutes downtime per year
'Four nines'
1 billion users
~1M QPS at 100 req/user/day
Back-of-envelope calc
Storage estimate
1TB = 10^12 bytes. 1PB = 10^15 bytes
For data sizing
02 Scalability
Vertical scaling
Bigger server: more CPU, RAM. Simple but has limits. Single point of failure.
Horizontal scaling
More servers. Distributes load. Requires stateless design + load balancer.
Stateless design
Server holds no session state. State stored in DB/cache. Required for horizontal scale.
CDN
Content Delivery Network. Cache static assets geographically close to users.
Data partitioning
Split large dataset across multiple servers (sharding).
Read replicas
Copy database to multiple servers for read traffic. Eventual consistency.
Denormalization
Duplicate data to avoid expensive JOINs at scale. Trade storage for speed.
Async processing
Offload heavy work to background jobs/queues. Fast API response + slower processing.
DESIGNScalability checklist
1. Identify bottlenecks (DB? App? Network?)
2. Add caching (Redis) for frequently read data
3. Use CDN for static assets
4. Scale horizontally with load balancer
5. Split read/write to primary + replicas
6. Partition data (sharding) for massive scale
7. Use async queues for heavy processing
8. Move to microservices if monolith becomes bottleneck
03 Load Balancing
Load balancer
Distributes requests across servers. Prevents any one server from overload.
Round robin
Each server gets requests in turn. Simple, equal distribution.
Least connections
Route to server with fewest active connections. Better for varied request times.
IP hash
Same client always hits same server. Useful for session affinity.
Layer 4 LB
Routes based on IP/TCP. Fast, no content inspection. AWS NLB.
Layer 7 LB
Routes based on HTTP content (URL, headers). AWS ALB, Nginx, HAProxy.
Health checks
LB sends ping to servers. Removes unhealthy servers automatically.
Failover
If primary fails, traffic automatically shifts to standby. Active-passive or active-active.
DESIGNNginx load balancer config
upstream backend {
    least_conn;            # strategy
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
    server 10.0.0.3:3000;
    keepalive 32;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}
04 Caching
Cache hit
Data found in cache. Fast response. No DB query.
Cache miss
Data NOT in cache. Must fetch from DB, then store in cache.
Cache eviction
Remove old data when cache full. LRU (least recently used) most common.
TTL
Time-to-live. How long before cache entry expires and refreshes.
Cache aside
App checks cache → miss → fetch DB → write to cache. Most common pattern.
Write through
Write to DB AND cache simultaneously. Consistent but slower writes.
Write back
Write to cache first, async flush to DB later. Fast writes but risk of data loss.
Cache invalidation
Hardest problem: when to remove stale data from cache.
DESIGNRedis caching pattern
# Cache aside pattern (Node.js)
async function getUser(id) {
  // 1. Check cache
  const cached = await redis.get(`user:${id}`);
  if (cached) return JSON.parse(cached);  // HIT

  // 2. Cache miss — fetch from DB
  const user = await db.users.findById(id);

  // 3. Store in cache with TTL
  await redis.setex(`user:${id}`, 3600, JSON.stringify(user));

  return user;  // MISS
}

# Cache invalidation — on update
async function updateUser(id, data) {
  await db.users.update(id, data);
  await redis.del(`user:${id}`);  // invalidate cache
}
💡
Cache what's read often and written rarely. Don't cache fast-changing or personalized data without very short TTL.
05 Databases
SQL
Structured, relations, ACID, schema. Best for complex queries, transactions. PostgreSQL, MySQL.
NoSQL
Flexible schema, horizontal scale. Best for large volumes, variety. MongoDB, Cassandra, DynamoDB.
ACID
Atomicity+Consistency+Isolation+Durability. Guaranteed by SQL.
BASE
Basically Available, Soft state, Eventual consistency. NoSQL trade-off for scale.
Replication
Primary node accepts writes. Replicas for reads. Failover if primary dies.
Sharding
Split data across multiple DB servers by shard key. Increases write capacity.
Consistent hashing
Distribute data across nodes minimally disrupted when nodes added/removed.
Indexing
B-tree for range queries. Hash for equality. Must index all WHERE/JOIN fields.
SQL choice
Relational data, complex queries, ACID transactions
Banking, ERP
MongoDB choice
Flexible schema, rapid iteration, nested data
Startups, CMS
Redis choice
Sub-millisecond reads, pub/sub, sessions, queues
Cache, real-time
Cassandra choice
Massive write throughput, time-series, global distribution
IoT, logs, metrics
06 Messaging Queues
Message queue
Asynchronous buffer between producer and consumer. Decouples services.
Producer
Sends messages to the queue. Doesn't wait for processing.
Consumer
Reads and processes messages from the queue. Can scale independently.
At-least-once
Message delivered at least once. Possible duplicates — design idempotency.
Exactly-once
Message delivered exactly once. Harder, uses two-phase commit.
Dead letter queue
Failed messages go here for inspection/retry.
Pub/Sub
Publisher sends to topic. Multiple subscribers receive. Kafka, SNS.
Use cases
Email sending, image resizing, order processing, notifications, log ingestion.
DESIGNMessage queue use case
// Without queue (bad)
app.post('/register', async (req, res) => {
  await db.createUser(req.body);       // fast
  await sendWelcomeEmail(req.body);    // slow! blocks response
  await generateAvatar(req.body);      // slow!
  res.json({ success: true });         // user waits seconds!
});

// With queue (good)
app.post('/register', async (req, res) => {
  const user = await db.createUser(req.body);  // fast
  await queue.publish('user.registered', user); // instant
  res.json({ success: true });  // responds immediately!
  // queue consumers handle: email, avatar, analytics async
});
07 API Design
DESIGNREST API best practices
// Resource naming (nouns, not verbs)
GET    /users          → list users
GET    /users/:id      → get user
POST   /users          → create user
PUT    /users/:id      → replace user
PATCH  /users/:id      → partial update
DELETE /users/:id      → delete user

// HTTP status codes
200 OK             201 Created         204 No Content
400 Bad Request    401 Unauthorized    403 Forbidden
404 Not Found      409 Conflict        422 Unprocessable
429 Too Many Reqs  500 Server Error    503 Unavailable

// Pagination
GET /posts?page=2&limit=20
GET /posts?cursor=abc123&limit=20  (cursor-based, better for large)

// Versioning
GET /api/v1/users    (URL versioning — most common)
GET /api/users  Accept: application/vnd.api.v2+json  (header)

// Rate limiting response headers
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1640000000
Idempotent
PUT/DELETE/GET same result if called multiple times. POST is NOT idempotent.
HATEOAS
Hypertext links in responses tell client what actions are available. Full REST.
08 CAP Theorem
DESIGNCAP Theorem
CAP = Consistency + Availability + Partition Tolerance

You can only guarantee 2 of 3:

CP (Consistent + Partition tolerant):
  - Returns error if can't guarantee consistency
  - Example: HBase, MongoDB (in certain configs)
  - Use: Banking, inventory (can't show wrong balance)

AP (Available + Partition tolerant):
  - Returns possibly stale data during network partition
  - Example: CouchDB, Cassandra, DynamoDB
  - Use: Social feeds, product catalogue (slightly stale is ok)

CA (Consistent + Available):
  - No partition tolerance — only works on single node
  - Example: Traditional SQL on single machine
  - Not useful for distributed systems

PACELC extension (for normal operation):
  If Partition: tradeoff C vs A
  Else: tradeoff L (Latency) vs C (Consistency)
PACELC
More realistic model: even without partitions, tradeoff between Latency and Consistency
Eventual consistency
Given enough time, all replicas will agree. Used by DNS, social media feeds.
Strong consistency
All reads see latest write. Slower. Used by banks, payment systems.
09 Microservices
Monolith
Single deployable unit. Simple to start. Hard to scale independently.
Microservices
Small independent services per business function. Each has own DB.
Service discovery
Services find each other (Consul, Kubernetes DNS). IP addresses change.
API Gateway
Single entry point for all clients. Routes to services, handles auth, rate limiting.
Circuit breaker
If service X fails repeatedly, stop calling it. Return fallback. (Hystrix, Resilience4j)
Saga pattern
Distributed transaction across services using event-driven choreography.
Service mesh
Network layer handling service-to-service communication. (Istio, Linkerd)
12-Factor App
Methodology for building scalable, portable microservices.
DESIGNMicroservices example
# Monolith (everything in one)
App → UserService → ProductService → OrderService → DB

# Microservices (separate deployments)
API Gateway
  ├─ User Service  (owns user-db)
  ├─ Product Service (owns product-db)
  ├─ Order Service (owns order-db)
  ├─ Notification Service
  └─ Payment Service

Communication:
  Sync:  REST / gRPC (request-response)
  Async: Kafka / RabbitMQ (event-driven)
10 Mini Quizzes
❓ Quiz 1
What does CAP theorem state?
CAP: In a distributed system experiencing network partition, you must choose between Consistency (return error) or Availability (return possibly stale data). You can never have all three simultaneously.
❓ Quiz 2
What is the purpose of a message queue in system design?
Message queues buffer work between producer and consumer. The producer enqueues tasks (e.g., 'send welcome email') and returns immediately. Consumers process asynchronously. This improves latency and reliability.