Skip to content

Chapter 16: Distributed Database


Metadata Card

AttributeContent
Difficulty(Advanced)
PrerequisitesChapter 15 (NoSQL), distributed systems basics
KeywordsSharding, replication, consistency, consensus, Raft, Paxos, gossip, distributed SQL
CodePython (conceptual)

Your Progress

"Starting to build a cross-city branch warehouse network — how to shard, replicate, and keep data consistent."

Why Distribute?

  • Data volume: More than a single machine can store
  • Read throughput: Too many queries per second
  • Write throughput: Write-heavy workloads
  • Geo-distribution: Serve users worldwide with low latency
  • Fault tolerance: Survive machine failures

Partitioning (Sharding)

Split data across multiple nodes:

StrategyDescriptionExample
Hash partitioninghash(key) % NSimple, even distribution
Range partitioningKey ranges to nodesEfficient range scans
Consistent hashingRing-based, minimal movement on reshardingCassandra, DynamoDB
Directory-basedExternal lookup serviceHBase

Re-sharding: When data grows beyond current capacity. Challenging — requires data migration.

Replication

Copy data across multiple nodes:

TypeDescriptionConsistency
Single-leaderOne primary writer, many read replicasStrong read-after-write possible
Multi-leaderMultiple writersConflict resolution needed
LeaderlessAny node can writeRead repair, hinted handoff

Consistency Models

ModelDescriptionExample
Strong ConsistencyReads always see latest writeSingle-node
Eventual ConsistencyReads may return stale dataDNS
Causal ConsistencyRelated updates seen in orderSocial feeds
Read Your WritesWriter sees their own writesUser profiles
Monotonic ReadsRead staleness doesn't go backwardsSession consistency

Consensus Algorithms

AlgorithmUsed ByKey Feature
PaxosGoogle Chubby, SpannerProven correct, hard to implement
Raftetcd, Consul, TiKV, MongoDBUnderstandable, modular
ZABZooKeeperAtomic broadcast
Viewstamped ReplicationSimilar to Raft (earlier)

Raft in Brief

  1. Leader election: Nodes elect a leader
  2. Log replication: Leader receives client writes, replicates to followers
  3. Commit: Once a majority of nodes store an entry, it's committed
  4. Safety: Leader ensures all committed entries are preserved

Distributed Transactions

  • Two-Phase Commit (2PC): Coordinator asks "prepare?" then "commit!" — blocking on coordinator failure
  • Three-Phase Commit (3PC): Non-blocking, but more round trips
  • Distributed MVCC / OCC: Timestamp-based distributed concurrency control
  • Google Spanner: TrueTime API for globally consistent transactions

Traveler's Notes

Distributed databases represent the convergence of ACID semantics and horizontal scaling. The trade-offs are governed by the CAP theorem, but real systems offer tunable knobs. Modern distributed SQL databases (Spanner, CockroachDB, TiDB) provide familiar SQL interfaces with global scale — the holy grail of database engineering.

Built with VitePress | Software Systems Atlas