Skip to content

Chapter 14: In-Memory Database


Metadata Card

DimensionValue
Difficulty(Intermediate)
PrerequisitesChapter 4 (B+ Tree), Chapter 11 (ACID), Chapter 12 (MVCC)
KeywordsIn-memory database, main-memory database, RAM, cache line, tuple storage, Durability, checkpointing
Code LanguageC (conceptual), Python (simulation)

Your Progress

"At the top floor of the fortress, the rapid tasting room — all goods right in front of you, no need to go down to the basement to search."

Why In-Memory?

Traditional databases store data on disk. In-memory databases store all data in RAM. This eliminates disk I/O as a bottleneck.

Performance gap (latency):

OperationDisk (SSD)RAM
Sequential read~100 μs~100 ns
Random access~50 μs (500x slower)~100 ns

In-memory databases exploit this gap for orders-of-magnitude speedups.

Tuple Storage

In-memory databases don't need fixed-size pages. They store tuples in:

  • Arrays: Simple and cache-friendly (for OLTP)
  • Hash tables: Fast point lookups
  • Columns: For analytical workloads (column store in memory)

Indexing

Without disk pages, B+ Tree is no longer optimal. In-memory databases use:

  • T-Tree: Balanced tree optimized for memory (popularized by TimesTen)
  • Masstree: Cache-friendly, optimized for modern CPU architectures
  • Skip lists: Simple concurrent implementation
  • Hash indexes: For point lookups

Interaction with Disk

Pure in-memory databases are vulnerable to power loss. Solutions:

Durability:

  • Periodic snapshots: Take full database snapshots to disk (checkpointing)
  • Command logging: Log each transaction to disk (redo-log)
  • Replication: Maintain copies on other nodes

Recovery on restart:

  1. Load latest snapshot into memory
  2. Replay command log after snapshot
  3. Ready for queries

Example: Modern Systems

DatabaseTypeEngine
RedisKey-valueIn-memory only, periodic snapshots + AOF
MemcachedCacheIn-memory only, no persistence
VoltDBRelational SQLIn-memory + command logging + replication
SAP HANARelational SQLIn-memory + column store + snapshots
SingleStoreRelational SQLIn-memory + row/column hybrid
DuckDBAnalytical SQLIn-memory "embedded OLAP"

Cache Line Awareness

Modern CPUs access memory in cache lines (64 bytes). In-memory databases optimize data layout to maximize cache line utilization:

// Cache-unfriendly: struct per row
struct Row { char name[32]; int value; long timestamp; };

// Cache-friendly: struct of arrays
struct Rows { char names[N][32]; int values[N]; long timestamps[N]; };

Non-Volatile Memory (NVM)

Emerging technology (Intel Optane, though discontinued). Combines RAM-like speed with persistence. Databases are adapting to NVM's byte-addressable, persistent storage model.


Traveler's Notes

In-memory databases represent the ultimate expression of "the cheap solution to memory is to use less of it" — except with modern hardware, RAM is now large enough and affordable enough to store entire working datasets. The trade-off is durability vs speed, but with smart checkpointing and replication, many use cases can safely run in memory.

Built with VitePress | Software Systems Atlas