Chapter 14: In-Memory Database

Metadata Card
Dimension Value
Difficulty (Intermediate)
Prerequisites Chapter 4 (B+ Tree), Chapter 11 (ACID), Chapter 12 (MVCC)
Keywords In-memory database, main-memory database, RAM, cache line, tuple storage, Durability, checkpointing
Code Language C (conceptual), Python (simulation)

Dimension	Value
Difficulty	(Intermediate)
Prerequisites	Chapter 4 (B+ Tree), Chapter 11 (ACID), Chapter 12 (MVCC)
Keywords	In-memory database, main-memory database, RAM, cache line, tuple storage, Durability, checkpointing
Code Language	C (conceptual), Python (simulation)

Your Progress

"At the top floor of the fortress, the rapid tasting room — all goods right in front of you, no need to go down to the basement to search."

Why In-Memory?

Traditional databases store data on disk. In-memory databases store all data in RAM. This eliminates disk I/O as a bottleneck.

Performance gap (latency):

Operation	Disk (SSD)	RAM
Sequential read	~100 μs	~100 ns
Random access	~50 μs (500x slower)	~100 ns

In-memory databases exploit this gap for orders-of-magnitude speedups.

Tuple Storage

In-memory databases don't need fixed-size pages. They store tuples in:

Arrays: Simple and cache-friendly (for OLTP)
Hash tables: Fast point lookups
Columns: For analytical workloads (column store in memory)

Indexing

Without disk pages, B+ Tree is no longer optimal. In-memory databases use:

T-Tree: Balanced tree optimized for memory (popularized by TimesTen)
Masstree: Cache-friendly, optimized for modern CPU architectures
Skip lists: Simple concurrent implementation
Hash indexes: For point lookups

Interaction with Disk

Pure in-memory databases are vulnerable to power loss. Solutions:

Durability:

Periodic snapshots: Take full database snapshots to disk (checkpointing)
Command logging: Log each transaction to disk (redo-log)
Replication: Maintain copies on other nodes

Recovery on restart:

Load latest snapshot into memory
Replay command log after snapshot
Ready for queries

Example: Modern Systems

Database	Type	Engine
Redis	Key-value	In-memory only, periodic snapshots + AOF
Memcached	Cache	In-memory only, no persistence
VoltDB	Relational SQL	In-memory + command logging + replication
SAP HANA	Relational SQL	In-memory + column store + snapshots
SingleStore	Relational SQL	In-memory + row/column hybrid
DuckDB	Analytical SQL	In-memory "embedded OLAP"

Cache Line Awareness

Modern CPUs access memory in cache lines (64 bytes). In-memory databases optimize data layout to maximize cache line utilization:

// Cache-unfriendly: struct per row
struct Row { char name[32]; int value; long timestamp; };

// Cache-friendly: struct of arrays
struct Rows { char names[N][32]; int values[N]; long timestamps[N]; };

Non-Volatile Memory (NVM)

Emerging technology (Intel Optane, though discontinued). Combines RAM-like speed with persistence. Databases are adapting to NVM's byte-addressable, persistent storage model.

Traveler's Notes

In-memory databases represent the ultimate expression of "the cheap solution to memory is to use less of it" — except with modern hardware, RAM is now large enough and affordable enough to store entire working datasets. The trade-off is durability vs speed, but with smart checkpointing and replication, many use cases can safely run in memory.

Chapter 14: In-Memory Database ​

Your Progress ​

Why In-Memory? ​

Tuple Storage ​

Indexing ​

Interaction with Disk ​

Example: Modern Systems ​

Cache Line Awareness ​

Non-Volatile Memory (NVM) ​

Traveler's Notes ​