Skip to content

Chapter 5: Cache and Memory Hierarchy — Speed Buffers

Vol 3: Computer Core Expedition · Chapter 5


Metadata Card

AttributeValue
Difficulty(Advanced)
PrerequisitesMemory model (Chapter 3), Instruction pipeline (Chapter 4)
KeywordsCache Line, Locality, MESI, False Sharing, Write-back

Your Progress

"You discover a strange phenomenon in the Core — the CPU runs blazingly fast, but memory can't keep up. To bridge this gap, engineers inserted a tiny buffer zone — the cache. Closer to the CPU means smaller but faster."

Memory hierarchy pyramid:

Registers (0.3-1 ns)
L1 Cache (1-3 ns)       ← personal notebook
L2 Cache (3-10 ns)      ← desktop reference
L3 Cache (10-40 ns)     ← shared bookshelf
DRAM (50-100 ns)        ← library
Disk/SSD (ms)           ← remote archive

Encounter 1: Locality

c
// Good: row-major traversal (spatial locality)
for (int i = 0; i < 1024; i++)
    for (int j = 0; j < 1024; j++)
        sum += matrix[i][j];

// Bad: column-major (cache misses on every access)
for (int j = 0; j < 1024; j++)
    for (int i = 0; i < 1024; i++)
        sum += matrix[i][j];

Version A is typically 10-50× faster.

Encounter 2: Cache Lines and Mapping

  • Cache line = 64 bytes (minimum transfer unit)
  • Direct-mapped: Each memory block maps to exactly one cache line
  • N-way set-associative: Each memory block maps to one of N slots in a set
  • Fully-associative: Any memory block can go anywhere

Encounter 3: Write Strategies

  • Write-through: Write to cache AND memory simultaneously (slow but consistent)
  • Write-back: Write to cache only, write back to memory when evicted (fast, needs consistency protocol)

Encounter 4: MESI Protocol

StateMeaning
ModifiedOwned by this core, modified (different from memory)
ExclusiveOwned by this core, unmodified
SharedMultiple cores hold copies, unmodified
InvalidData is stale

Encounter 5: False Sharing

Two variables in the same cache line, modified by different cores → constant cache line bouncing → 75× slowdown.

Fix: alignas(64) to put them on separate cache lines.


Verification Checklist

  • [ ] Can explain spatial vs temporal locality
  • [ ] Can identify and fix false sharing
  • [ ] Can explain write-back vs write-through
  • [ ] Can use cache-friendly access patterns (row-major, SoA)

Traveler's Notes

  • Cache design is a prediction — the CPU bets you'll access nearby data
  • False sharing is the "quietest" performance killer — alignas(64) fixes it instantly
  • Keep hot data close to CPU, keep different threads' hot data far from each other

Next Stop Preview

Chapter 6: Virtual Memory — The Memory Illusion

Built with VitePress | Software Systems Atlas