Chapter 16: Distributed Database
Metadata Card
| Attribute | Content |
|---|---|
| Difficulty | (Advanced) |
| Prerequisites | Chapter 15 (NoSQL), distributed systems basics |
| Keywords | Sharding, replication, consistency, consensus, Raft, Paxos, gossip, distributed SQL |
| Code | Python (conceptual) |
Your Progress
"Starting to build a cross-city branch warehouse network — how to shard, replicate, and keep data consistent."
Why Distribute?
- Data volume: More than a single machine can store
- Read throughput: Too many queries per second
- Write throughput: Write-heavy workloads
- Geo-distribution: Serve users worldwide with low latency
- Fault tolerance: Survive machine failures
Partitioning (Sharding)
Split data across multiple nodes:
| Strategy | Description | Example |
|---|---|---|
| Hash partitioning | hash(key) % N | Simple, even distribution |
| Range partitioning | Key ranges to nodes | Efficient range scans |
| Consistent hashing | Ring-based, minimal movement on resharding | Cassandra, DynamoDB |
| Directory-based | External lookup service | HBase |
Re-sharding: When data grows beyond current capacity. Challenging — requires data migration.
Replication
Copy data across multiple nodes:
| Type | Description | Consistency |
|---|---|---|
| Single-leader | One primary writer, many read replicas | Strong read-after-write possible |
| Multi-leader | Multiple writers | Conflict resolution needed |
| Leaderless | Any node can write | Read repair, hinted handoff |
Consistency Models
| Model | Description | Example |
|---|---|---|
| Strong Consistency | Reads always see latest write | Single-node |
| Eventual Consistency | Reads may return stale data | DNS |
| Causal Consistency | Related updates seen in order | Social feeds |
| Read Your Writes | Writer sees their own writes | User profiles |
| Monotonic Reads | Read staleness doesn't go backwards | Session consistency |
Consensus Algorithms
| Algorithm | Used By | Key Feature |
|---|---|---|
| Paxos | Google Chubby, Spanner | Proven correct, hard to implement |
| Raft | etcd, Consul, TiKV, MongoDB | Understandable, modular |
| ZAB | ZooKeeper | Atomic broadcast |
| Viewstamped Replication | — | Similar to Raft (earlier) |
Raft in Brief
- Leader election: Nodes elect a leader
- Log replication: Leader receives client writes, replicates to followers
- Commit: Once a majority of nodes store an entry, it's committed
- Safety: Leader ensures all committed entries are preserved
Distributed Transactions
- Two-Phase Commit (2PC): Coordinator asks "prepare?" then "commit!" — blocking on coordinator failure
- Three-Phase Commit (3PC): Non-blocking, but more round trips
- Distributed MVCC / OCC: Timestamp-based distributed concurrency control
- Google Spanner: TrueTime API for globally consistent transactions
Traveler's Notes
Distributed databases represent the convergence of ACID semantics and horizontal scaling. The trade-offs are governed by the CAP theorem, but real systems offer tunable knobs. Modern distributed SQL databases (Spanner, CockroachDB, TiDB) provide familiar SQL interfaces with global scale — the holy grail of database engineering.