System Design Cheatsheet — BitWithBite

🏗️ Architecture

System Design Cheatsheet

Scalability, caching, databases, load balancing, CAP theorem and microservices.

📖 10 sections

⏱ 28 min read

✅ Quizzes included

🌙 Dark mode

01 Fundamentals ▼

System Design

Planning architecture of large-scale systems. Focus: scalability, reliability, availability, performance.

Functional reqs

What the system should DO: features, APIs, user stories.

Non-functional reqs

How the system should PERFORM: latency, throughput, availability, consistency.

Throughput

Requests per second (RPS) a system can handle.

Latency

Time for a single request to complete. p50, p95, p99 percentiles.

Availability

Uptime percentage. 99.9% = 8.7 hrs downtime/year. 99.99% = 52 min.

SLA

Service Level Agreement. Contractual availability promise.

Back-of-envelope

Estimate scale before designing: 1B users, avg 100 requests/day = ~1M RPS

99.9% uptime

8.7 hours downtime per year

'Three nines'

99.99% uptime

52 minutes downtime per year

'Four nines'

1 billion users

~1M QPS at 100 req/user/day

Back-of-envelope calc

Storage estimate

1TB = 10^12 bytes. 1PB = 10^15 bytes

For data sizing

02 Scalability ▼

Vertical scaling

Bigger server: more CPU, RAM. Simple but has limits. Single point of failure.

Horizontal scaling

More servers. Distributes load. Requires stateless design + load balancer.

Stateless design

Server holds no session state. State stored in DB/cache. Required for horizontal scale.

CDN

Content Delivery Network. Cache static assets geographically close to users.

Data partitioning

Split large dataset across multiple servers (sharding).

Read replicas

Copy database to multiple servers for read traffic. Eventual consistency.

Denormalization

Duplicate data to avoid expensive JOINs at scale. Trade storage for speed.

Async processing

Offload heavy work to background jobs/queues. Fast API response + slower processing.

DESIGNScalability checklist

1. Identify bottlenecks (DB? App? Network?)
2. Add caching (Redis) for frequently read data
3. Use CDN for static assets
4. Scale horizontally with load balancer
5. Split read/write to primary + replicas
6. Partition data (sharding) for massive scale
7. Use async queues for heavy processing
8. Move to microservices if monolith becomes bottleneck

03 Load Balancing ▼

Load balancer

Distributes requests across servers. Prevents any one server from overload.

Round robin

Each server gets requests in turn. Simple, equal distribution.

Least connections

Route to server with fewest active connections. Better for varied request times.

IP hash

Same client always hits same server. Useful for session affinity.

Layer 4 LB

Routes based on IP/TCP. Fast, no content inspection. AWS NLB.

Layer 7 LB

Routes based on HTTP content (URL, headers). AWS ALB, Nginx, HAProxy.

Health checks

LB sends ping to servers. Removes unhealthy servers automatically.

Failover

If primary fails, traffic automatically shifts to standby. Active-passive or active-active.

DESIGNNginx load balancer config

upstream backend {
    least_conn;            # strategy
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
    server 10.0.0.3:3000;
    keepalive 32;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

04 Caching ▼

Cache hit

Data found in cache. Fast response. No DB query.

Cache miss

Data NOT in cache. Must fetch from DB, then store in cache.

Cache eviction

Remove old data when cache full. LRU (least recently used) most common.

TTL

Time-to-live. How long before cache entry expires and refreshes.

Cache aside

App checks cache → miss → fetch DB → write to cache. Most common pattern.

Write through

Write to DB AND cache simultaneously. Consistent but slower writes.

Write back

Write to cache first, async flush to DB later. Fast writes but risk of data loss.

Cache invalidation

Hardest problem: when to remove stale data from cache.

DESIGNRedis caching pattern

# Cache aside pattern (Node.js)
async function getUser(id) {
  // 1. Check cache
  const cached = await redis.get(`user:${id}`);
  if (cached) return JSON.parse(cached);  // HIT

  // 2. Cache miss — fetch from DB
  const user = await db.users.findById(id);

  // 3. Store in cache with TTL
  await redis.setex(`user:${id}`, 3600, JSON.stringify(user));

  return user;  // MISS
}

# Cache invalidation — on update
async function updateUser(id, data) {
  await db.users.update(id, data);
  await redis.del(`user:${id}`);  // invalidate cache
}

💡

Cache what's read often and written rarely. Don't cache fast-changing or personalized data without very short TTL.

05 Databases ▼

SQL

Structured, relations, ACID, schema. Best for complex queries, transactions. PostgreSQL, MySQL.

NoSQL

Flexible schema, horizontal scale. Best for large volumes, variety. MongoDB, Cassandra, DynamoDB.

ACID

Atomicity+Consistency+Isolation+Durability. Guaranteed by SQL.

BASE

Basically Available, Soft state, Eventual consistency. NoSQL trade-off for scale.

Replication

Primary node accepts writes. Replicas for reads. Failover if primary dies.

Sharding

Split data across multiple DB servers by shard key. Increases write capacity.

Consistent hashing

Distribute data across nodes minimally disrupted when nodes added/removed.

Indexing

B-tree for range queries. Hash for equality. Must index all WHERE/JOIN fields.

SQL choice

Relational data, complex queries, ACID transactions

Banking, ERP

MongoDB choice

Flexible schema, rapid iteration, nested data

Startups, CMS

Redis choice

Sub-millisecond reads, pub/sub, sessions, queues

Cache, real-time

Cassandra choice

Massive write throughput, time-series, global distribution

IoT, logs, metrics

06 Messaging Queues ▼

Message queue

Asynchronous buffer between producer and consumer. Decouples services.

Producer

Sends messages to the queue. Doesn't wait for processing.

Consumer

Reads and processes messages from the queue. Can scale independently.

At-least-once

Message delivered at least once. Possible duplicates — design idempotency.

Exactly-once

Message delivered exactly once. Harder, uses two-phase commit.

Dead letter queue

Failed messages go here for inspection/retry.

Pub/Sub

Publisher sends to topic. Multiple subscribers receive. Kafka, SNS.

Use cases

Email sending, image resizing, order processing, notifications, log ingestion.

DESIGNMessage queue use case

// Without queue (bad)
app.post('/register', async (req, res) => {
  await db.createUser(req.body);       // fast
  await sendWelcomeEmail(req.body);    // slow! blocks response
  await generateAvatar(req.body);      // slow!
  res.json({ success: true });         // user waits seconds!
});

// With queue (good)
app.post('/register', async (req, res) => {
  const user = await db.createUser(req.body);  // fast
  await queue.publish('user.registered', user); // instant
  res.json({ success: true });  // responds immediately!
  // queue consumers handle: email, avatar, analytics async
});

07 API Design ▼

DESIGNREST API best practices

// Resource naming (nouns, not verbs)
GET    /users          → list users
GET    /users/:id      → get user
POST   /users          → create user
PUT    /users/:id      → replace user
PATCH  /users/:id      → partial update
DELETE /users/:id      → delete user

// HTTP status codes
200 OK             201 Created         204 No Content
400 Bad Request    401 Unauthorized    403 Forbidden
404 Not Found      409 Conflict        422 Unprocessable
429 Too Many Reqs  500 Server Error    503 Unavailable

// Pagination
GET /posts?page=2&limit=20
GET /posts?cursor=abc123&limit=20  (cursor-based, better for large)

// Versioning
GET /api/v1/users    (URL versioning — most common)
GET /api/users  Accept: application/vnd.api.v2+json  (header)

// Rate limiting response headers
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1640000000

Idempotent

PUT/DELETE/GET same result if called multiple times. POST is NOT idempotent.

HATEOAS

Hypertext links in responses tell client what actions are available. Full REST.

08 CAP Theorem ▼

DESIGNCAP Theorem

CAP = Consistency + Availability + Partition Tolerance

You can only guarantee 2 of 3:

CP (Consistent + Partition tolerant):
  - Returns error if can't guarantee consistency
  - Example: HBase, MongoDB (in certain configs)
  - Use: Banking, inventory (can't show wrong balance)

AP (Available + Partition tolerant):
  - Returns possibly stale data during network partition
  - Example: CouchDB, Cassandra, DynamoDB
  - Use: Social feeds, product catalogue (slightly stale is ok)

CA (Consistent + Available):
  - No partition tolerance — only works on single node
  - Example: Traditional SQL on single machine
  - Not useful for distributed systems

PACELC extension (for normal operation):
  If Partition: tradeoff C vs A
  Else: tradeoff L (Latency) vs C (Consistency)

PACELC

More realistic model: even without partitions, tradeoff between Latency and Consistency

Eventual consistency

Given enough time, all replicas will agree. Used by DNS, social media feeds.

Strong consistency

All reads see latest write. Slower. Used by banks, payment systems.

09 Microservices ▼

Monolith

Single deployable unit. Simple to start. Hard to scale independently.

Microservices

Small independent services per business function. Each has own DB.

Service discovery

Services find each other (Consul, Kubernetes DNS). IP addresses change.

API Gateway

Single entry point for all clients. Routes to services, handles auth, rate limiting.

Circuit breaker

If service X fails repeatedly, stop calling it. Return fallback. (Hystrix, Resilience4j)

Saga pattern

Distributed transaction across services using event-driven choreography.

Service mesh

Network layer handling service-to-service communication. (Istio, Linkerd)

12-Factor App

Methodology for building scalable, portable microservices.

DESIGNMicroservices example

# Monolith (everything in one)
App → UserService → ProductService → OrderService → DB

# Microservices (separate deployments)
API Gateway
  ├─ User Service  (owns user-db)
  ├─ Product Service (owns product-db)
  ├─ Order Service (owns order-db)
  ├─ Notification Service
  └─ Payment Service

Communication:
  Sync:  REST / gRPC (request-response)
  Async: Kafka / RabbitMQ (event-driven)

10 Mini Quizzes ▼

❓ Quiz 1

What does CAP theorem state?

CAP: In a distributed system experiencing network partition, you must choose between Consistency (return error) or Availability (return possibly stale data). You can never have all three simultaneously.

❓ Quiz 2

What is the purpose of a message queue in system design?

Message queues buffer work between producer and consumer. The producer enqueues tasks (e.g., 'send welcome email') and returns immediately. Consumers process asynchronously. This improves latency and reliability.