Chapter 19: Caching System

Metadata Card

Attribute	Content
Difficulty	(Intermediate)
Prerequisites	Chapter 14 (In-Memory Database), understanding of read-heavy workloads
Keywords	Cache, CDN, Redis, Memcached, cache aside, read through, write through, cache invalidation, TTL, eviction policy
Code	Python, Redis commands

Your Progress

"Setting up a quick-access table at the warehouse entrance — caching solves the problem of reading hot data fast."

Why Cache?

Latency: Memory is 1000x faster than disk
Throughput: Databases have limits; caches absorb traffic
Cost: Caching reduces database load, delaying expensive scaling
Hot data: 80% of reads hit 20% of data (Pareto principle)

Cache Strategies

Strategy	How It Works	Pros	Cons
Cache-Aside	App checks cache → misses → reads DB → writes cache	Simple, app controls	Cache inconsistency
Read-Through	Cache layer reads DB on miss	Transparent to app	Complex cache setup
Write-Through	Write to cache first, cache writes to DB	Strong consistency	Higher write latency
Write-Behind	Write to cache, async flush to DB	Very fast writes	Risk of data loss
Write-Around	Write directly to DB, invalidate cache	Avoids cache pollution	More cache misses

Why Cache Invalidation is Hard

"Computer science has only two hard problems: cache invalidation and naming things." (Phil Karlton)

Stale data problem: If you update in DB but not in cache, users see old data. Solutions:

TTL (Time To Live): Accept eventual consistency, data expires after TTL
Explicit invalidation: Delete/update cache entry when DB changes
Version numbers: Store version with cached data, reject stale versions

Eviction Policies

Policy	Description	Best For
LRU	Evict least recently used	General-purpose
LFU	Evict least frequently used	Stable popularity
FIFO	Evict oldest entry	Cache with bounded lifetime
TTL	Evict expired entries	Fixed validity period
Random	Evict random entry	Minimal overhead

Redis

Most popular caching system. Features:

In-memory key-value store with data structures (strings, hashes, lists, sets, sorted sets)
Persistence: RDB (snapshots) and AOF (append-only file)
Replication: Leader-follower
Sentinel: High availability (automatic failover)
Cluster: Auto-sharding across nodes

python

import redis

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

# Basic cache operations
r.set('user:1001', '{"name":"Alice","level":42}', ex=3600)  # TTL=1 hour
user = r.get('user:1001')

# Cache-aside pattern
def get_user(user_id):
    cache_key = f'user:{user_id}'
    user = r.get(cache_key)
    if user is None:
        user = db.query(f"SELECT * FROM users WHERE id = {user_id}")
        r.setex(cache_key, 3600, json.dumps(user))
    return json.loads(user)

Memcached

Simple, fast, distributed memory cache. Key-value only (no data structures). Distributed via consistent hashing on the client side.

Cache Topologies

Topology	Description	When to Use
Local cache	Cache in application process (e.g., Guava, Caffeine)	Single-node, ultra-low latency
Distributed cache	External cache cluster (Redis, Memcached)	Multi-node, shared state
Multi-tier	Local + distributed combination	Extreme scale

Cache Stampede (Thundering Herd)

When many requests miss cache simultaneously and all hit the DB. Solutions:

Mutex: Only one request loads into cache, others wait
Probabilistic early expiration: Randomize TTL expiry
Backup cache: Layer 2 cache with longer TTL

Hot Keys

When a single key receives disproportionate traffic. Solutions:

Replication: Distribute hot key reads across replicas
Local caching: Read replicas have local caches
Sharding by request: Distribute hot key load

Traveler's Notes

Caching is the most cost-effective performance optimization. Adding a cache layer can reduce database load by 90%+ with minimal effort. But cache invalidation is genuinely hard — you'll encounter stale data, cache stampedes, and hot keys. Start with simple TTL-based caching and add invalidation logic as needed. And remember: a cache is only as useful as its hit rate.

Chapter 19: Caching System ​

Metadata Card ​

Your Progress ​

Why Cache? ​

Cache Strategies ​

Why Cache Invalidation is Hard ​

Eviction Policies ​

Redis ​

Memcached ​

Cache Topologies ​

Cache Stampede (Thundering Herd) ​

Hot Keys ​

Traveler's Notes ​