Chapter 5: Cache and Memory Hierarchy — Speed Buffers
Vol 3: Computer Core Expedition · Chapter 5
Metadata Card
| Attribute | Value |
|---|---|
| Difficulty | (Advanced) |
| Prerequisites | Memory model (Chapter 3), Instruction pipeline (Chapter 4) |
| Keywords | Cache Line, Locality, MESI, False Sharing, Write-back |
Your Progress
"You discover a strange phenomenon in the Core — the CPU runs blazingly fast, but memory can't keep up. To bridge this gap, engineers inserted a tiny buffer zone — the cache. Closer to the CPU means smaller but faster."
Memory hierarchy pyramid:
Registers (0.3-1 ns)
L1 Cache (1-3 ns) ← personal notebook
L2 Cache (3-10 ns) ← desktop reference
L3 Cache (10-40 ns) ← shared bookshelf
DRAM (50-100 ns) ← library
Disk/SSD (ms) ← remote archiveEncounter 1: Locality
c
// Good: row-major traversal (spatial locality)
for (int i = 0; i < 1024; i++)
for (int j = 0; j < 1024; j++)
sum += matrix[i][j];
// Bad: column-major (cache misses on every access)
for (int j = 0; j < 1024; j++)
for (int i = 0; i < 1024; i++)
sum += matrix[i][j];Version A is typically 10-50× faster.
Encounter 2: Cache Lines and Mapping
- Cache line = 64 bytes (minimum transfer unit)
- Direct-mapped: Each memory block maps to exactly one cache line
- N-way set-associative: Each memory block maps to one of N slots in a set
- Fully-associative: Any memory block can go anywhere
Encounter 3: Write Strategies
- Write-through: Write to cache AND memory simultaneously (slow but consistent)
- Write-back: Write to cache only, write back to memory when evicted (fast, needs consistency protocol)
Encounter 4: MESI Protocol
| State | Meaning |
|---|---|
| Modified | Owned by this core, modified (different from memory) |
| Exclusive | Owned by this core, unmodified |
| Shared | Multiple cores hold copies, unmodified |
| Invalid | Data is stale |
Encounter 5: False Sharing
Two variables in the same cache line, modified by different cores → constant cache line bouncing → 75× slowdown.
Fix: alignas(64) to put them on separate cache lines.
Verification Checklist
- [ ] Can explain spatial vs temporal locality
- [ ] Can identify and fix false sharing
- [ ] Can explain write-back vs write-through
- [ ] Can use cache-friendly access patterns (row-major, SoA)
Traveler's Notes
- Cache design is a prediction — the CPU bets you'll access nearby data
- False sharing is the "quietest" performance killer —
alignas(64)fixes it instantly - Keep hot data close to CPU, keep different threads' hot data far from each other
→ Next Stop Preview
Chapter 6: Virtual Memory — The Memory Illusion