Chapter 14: In-Memory Database
Metadata Card
Dimension Value Difficulty (Intermediate) Prerequisites Chapter 4 (B+ Tree), Chapter 11 (ACID), Chapter 12 (MVCC) Keywords In-memory database, main-memory database, RAM, cache line, tuple storage, Durability, checkpointing Code Language C (conceptual), Python (simulation)
Your Progress
"At the top floor of the fortress, the rapid tasting room — all goods right in front of you, no need to go down to the basement to search."
Why In-Memory?
Traditional databases store data on disk. In-memory databases store all data in RAM. This eliminates disk I/O as a bottleneck.
Performance gap (latency):
| Operation | Disk (SSD) | RAM |
|---|---|---|
| Sequential read | ~100 μs | ~100 ns |
| Random access | ~50 μs (500x slower) | ~100 ns |
In-memory databases exploit this gap for orders-of-magnitude speedups.
Tuple Storage
In-memory databases don't need fixed-size pages. They store tuples in:
- Arrays: Simple and cache-friendly (for OLTP)
- Hash tables: Fast point lookups
- Columns: For analytical workloads (column store in memory)
Indexing
Without disk pages, B+ Tree is no longer optimal. In-memory databases use:
- T-Tree: Balanced tree optimized for memory (popularized by TimesTen)
- Masstree: Cache-friendly, optimized for modern CPU architectures
- Skip lists: Simple concurrent implementation
- Hash indexes: For point lookups
Interaction with Disk
Pure in-memory databases are vulnerable to power loss. Solutions:
Durability:
- Periodic snapshots: Take full database snapshots to disk (checkpointing)
- Command logging: Log each transaction to disk (redo-log)
- Replication: Maintain copies on other nodes
Recovery on restart:
- Load latest snapshot into memory
- Replay command log after snapshot
- Ready for queries
Example: Modern Systems
| Database | Type | Engine |
|---|---|---|
| Redis | Key-value | In-memory only, periodic snapshots + AOF |
| Memcached | Cache | In-memory only, no persistence |
| VoltDB | Relational SQL | In-memory + command logging + replication |
| SAP HANA | Relational SQL | In-memory + column store + snapshots |
| SingleStore | Relational SQL | In-memory + row/column hybrid |
| DuckDB | Analytical SQL | In-memory "embedded OLAP" |
Cache Line Awareness
Modern CPUs access memory in cache lines (64 bytes). In-memory databases optimize data layout to maximize cache line utilization:
// Cache-unfriendly: struct per row
struct Row { char name[32]; int value; long timestamp; };
// Cache-friendly: struct of arrays
struct Rows { char names[N][32]; int values[N]; long timestamps[N]; };Non-Volatile Memory (NVM)
Emerging technology (Intel Optane, though discontinued). Combines RAM-like speed with persistence. Databases are adapting to NVM's byte-addressable, persistent storage model.
Traveler's Notes
In-memory databases represent the ultimate expression of "the cheap solution to memory is to use less of it" — except with modern hardware, RAM is now large enough and affordable enough to store entire working datasets. The trade-off is durability vs speed, but with smart checkpointing and replication, many use cases can safely run in memory.