Chapter 5: Cache and Memory Hierarchy — Speed Buffers

Vol 3: Computer Core Expedition · Chapter 5

Metadata Card

Attribute	Value
Difficulty	(Advanced)
Prerequisites	Memory model (Chapter 3), Instruction pipeline (Chapter 4)
Keywords	Cache Line, Locality, MESI, False Sharing, Write-back

Your Progress

"You discover a strange phenomenon in the Core — the CPU runs blazingly fast, but memory can't keep up. To bridge this gap, engineers inserted a tiny buffer zone — the cache. Closer to the CPU means smaller but faster."

Memory hierarchy pyramid:

Registers (0.3-1 ns)
L1 Cache (1-3 ns)       ← personal notebook
L2 Cache (3-10 ns)      ← desktop reference
L3 Cache (10-40 ns)     ← shared bookshelf
DRAM (50-100 ns)        ← library
Disk/SSD (ms)           ← remote archive

Encounter 1: Locality

// Good: row-major traversal (spatial locality)
for (int i = 0; i < 1024; i++)
    for (int j = 0; j < 1024; j++)
        sum += matrix[i][j];

// Bad: column-major (cache misses on every access)
for (int j = 0; j < 1024; j++)
    for (int i = 0; i < 1024; i++)
        sum += matrix[i][j];

Version A is typically 10-50× faster.

Encounter 2: Cache Lines and Mapping

Cache line = 64 bytes (minimum transfer unit)
Direct-mapped: Each memory block maps to exactly one cache line
N-way set-associative: Each memory block maps to one of N slots in a set
Fully-associative: Any memory block can go anywhere

Encounter 3: Write Strategies

Write-through: Write to cache AND memory simultaneously (slow but consistent)
Write-back: Write to cache only, write back to memory when evicted (fast, needs consistency protocol)

Encounter 4: MESI Protocol

State	Meaning
Modified	Owned by this core, modified (different from memory)
Exclusive	Owned by this core, unmodified
Shared	Multiple cores hold copies, unmodified
Invalid	Data is stale

Two variables in the same cache line, modified by different cores → constant cache line bouncing → 75× slowdown.

Fix: alignas(64) to put them on separate cache lines.

Verification Checklist

[ ] Can explain spatial vs temporal locality
[ ] Can identify and fix false sharing
[ ] Can explain write-back vs write-through
[ ] Can use cache-friendly access patterns (row-major, SoA)

Traveler's Notes

Cache design is a prediction — the CPU bets you'll access nearby data
False sharing is the "quietest" performance killer — alignas(64) fixes it instantly
Keep hot data close to CPU, keep different threads' hot data far from each other

→ Next Stop Preview

Chapter 6: Virtual Memory — The Memory Illusion

Chapter 5: Cache and Memory Hierarchy — Speed Buffers ​

Metadata Card ​

Your Progress ​

Encounter 1: Locality ​

Encounter 2: Cache Lines and Mapping ​

Encounter 3: Write Strategies ​

Encounter 4: MESI Protocol ​

Encounter 5: False Sharing ​

Verification Checklist ​

Traveler's Notes ​

→ Next Stop Preview ​