Skip to content

Chapter 18: Performance Engineering

Vol 3: Computer Core Expedition · Chapter 18


Metadata Card

AttributeValue
KeywordsProfiling, perf, Amdahl's Law, Cache-friendly, Performance Analysis Tools

Your Progress

"You've learned how computers work at every level — from electrons to system calls. Now it's time to put that knowledge to work: making programs fast."


Encounter 1: Amdahl's Law

The speedup of a system is limited by the portion that cannot be parallelized:

Speedup = 1 / ((1 - P) + P/N)

Where P is the parallelizable fraction and N is the number of processors.

Key insight: Even with infinite processors, the speedup is bounded by 1/(1-P).

Encounter 2: Profiling Tools

  • Linux perf: perf stat, perf record/perf report
  • gprof: GNU profiler (instrumentation-based)
  • Valgrind / Callgrind: Cache miss analysis
  • Flame graphs: Visualize CPU time distribution

Encounter 3: Cache-Friendly Code

c
// Cache-friendly: sequential access
for (int i = 0; i < N; i++)
    sum += arr[i];

// Cache-unfriendly: strided access
for (int i = 0; i < N; i += 64)
    sum += arr[i];

Data-Oriented Design: Organize data layout first, then design algorithms around it.

Encounter 4: Common Optimization Techniques

  1. Reduce memory allocations (arena allocators, object pools)
  2. Minimize cache misses (SoA layout, loop tiling)
  3. Avoid branch mispredictions (branchless programming, lookup tables)
  4. Use SIMD instructions (vectorization)
  5. Profile first, optimize second — 90% of time is spent in 10% of code

Verification Checklist

  • [ ] Can explain Amdahl's Law
  • [ ] Can use perf stat to measure cache misses
  • [ ] Can identify cache-friendly vs cache-unfriendly access patterns
  • [ ] Can explain the "profile first" principle

Traveler's Notes

  • "Make it work, make it right, make it fast" — in that order
  • Premature optimization is the root of all evil (Knuth)
  • Use a profiler before you optimize — your intuition about bottlenecks is often wrong
  • Measure twice, optimize once

The Journey Continues

You've completed all three volumes of the Software Systems Atlas. From bits and logic gates to CPUs, operating systems, and performance engineering — you now understand how computers work at every level.

The code you write is never "just code." Every a = b + c is a journey through registers, caches, ALUs, and the operating system. Every function call is a stack frame being built and destroyed. Every malloc is a negotiation with the virtual memory system.

You are no longer just a programmer. You are a systems engineer.


— Software Systems Atlas · Completed —

Built with VitePress | Software Systems Atlas