Chapter 15: NoSQL Family
Metadata Card
| Attribute | Content |
|---|---|
| Difficulty | (Intermediate) |
| Prerequisites | Chapter 1 (SQL), understanding of ACID vs BASE |
| Keywords | NoSQL, key-value, document, wide-column, graph, CAP theorem, eventual consistency |
| Code | Python |
Your Progress
"You step outside the main warehouse and see specialized workshops — different storage methods tailored for special goods."
Why NoSQL?
- Scale: Billions of users, petabytes of data
- Schema flexibility: Evolving data shapes
- Performance: Specific workloads optimized
- Distributed: Built for horizontal scaling
Four Families of NoSQL
| Family | Model | Examples | Best For |
|---|---|---|---|
| Key-Value | Global hash map | Redis, DynamoDB, Memcached | Caching, session store |
| Document | JSON/BSON documents | MongoDB, Couchbase | Content management, catalogs |
| Wide-Column | Column families, sorted | Cassandra, HBase, Bigtable | Time-series, IoT, analytics |
| Graph | Nodes + edges | Neo4j, JanusGraph | Social networks, recommendations |
CAP Theorem
A distributed data system can have at most 2 of 3: | Consistency | All nodes see the same data at the same time | | Availability | Every request gets a response (even if stale) | | Partition Tolerance | System continues despite network splits |
Since partitions are inevitable, you choose CP or AP:
| Choice | Systems | Behavior |
|---|---|---|
| CP | HBase, etcd, Zookeeper | Sacrifice availability during partition |
| AP | Cassandra, DynamoDB, Couchbase | Accept eventual consistency |
BASE vs ACID
| ACID | BASE |
|---|---|
| Strong consistency | Eventual consistency |
| Pessimistic (locking) | Optimistic |
| Isolation | Read outdated OK |
| Focus on correctness | Focus on availability |
Key-Value Stores
Simplest model: get(key), put(key, value), delete(key). Value is opaque (any blob). Memcached: pure cache. Redis: data structures + persistence. DynamoDB: fully managed, auto-scaling.
Document Stores
Store JSON/BSON documents. Schema-flexible — different documents in the same collection can have different fields. MongoDB is the most popular. Supports secondary indexes, aggregation pipeline.
Wide-Column Stores
Inspired by Google Bigtable. Data organized as tables with rows and columns, but columns are grouped into column families. Cassandra is AP (eventual consistency, tunable consistency levels). HBase is CP (strong consistency, based on HDFS).
Graph Databases
Store entities as nodes and relationships as edges. Optimized for graph traversals: "Find friends of friends of user X who like chess." Uses adjacency (index-free) for efficient traversals.
NewSQL
Hybrid systems that provide SQL semantics with NoSQL-scale distributed architecture. Examples: CockroachDB, TiDB, YugabyteDB, Google Spanner.
Traveler's Notes
"NoSQL" doesn't mean "SQL is bad" — it means "Not Only SQL." Each NoSQL family excels at specific use cases. The key skill is choosing the right tool: relational for structured data with complex queries, key-value for caching, document for flexible schemas, wide-column for time-series, graph for relationships. Modern applications often use multiple systems (polyglot persistence).