Chapter 9: Query Compilation

Metadata Card

Attribute	Content
Difficulty	(Advanced)
Prerequisites	Chapter 8 (Query Execution & Optimizer), compiler basics
Keywords	Query compilation, JIT, code generation, LLVM, interpreted execution, compiled execution
Code	Conceptual Python/C++

Your Progress

"Master Chen discovered that 'verbal instructions' were too slow. He started compiling instructions directly into mechanical operations — interpretation vs compilation."

The Interpretation Overhead

Traditional database execution uses the Volcano iterator model: each operator (scan, filter, join) implements next() that returns one tuple. The overhead per tuple includes:

Function calls between operators
Virtual method dispatch
Type checking
Null checking
Branch misprediction

For queries processing billions of rows, this overhead becomes significant.

Compiled Execution

Instead of interpreting the query plan at runtime, compile the query into native machine code. This eliminates interpretation overhead.

Approaches

Approach	Example	How
Template-based	Hyper, Databricks Photon	Generate C++ code from query plan, compile via gcc/LLVM
JIT-compiled	HyPer, Actian Vector	Compile to LLVM IR at runtime, JIT to native code
Bytecode	SQLite	Compile to custom bytecode, interpreted by a VM loop

Code Generation Example

SQL: SELECT name, value FROM items WHERE type = 'Weapon' AND value > 100

Template-based compiled version (pseudocode):

void query_scan(int page_count, Page* pages, OutputBuffer* output) {
    for (int p = 0; p < page_count; p++) {
        Page* page = &pages[p];
        int row_count = page->row_count;
        for (int r = 0; r < row_count; r++) {
            Row* row = &page->rows[r];
            // Filter: type = 'Weapon' AND value > 100
            if (strcmp(row->type, "Weapon") == 0 && row->value > 100) {
                // Project: name, value
                output->write(row->name, row->value);
            }
        }
    }
}

No function calls per row! The entire query is a tight loop with the filters and projections baked in.

Benefits

Aspect	Interpretation	Compilation
Per-tuple overhead	10-100 CPU cycles	< 5 CPU cycles
Branch prediction	Poor (operators switch)	Excellent (tight loops)
Cache utilization	Poor (scattered state)	Excellent (localized data)
Compile time	None	10-100ms (acceptable for complex queries)

Compilation vs Vectorization

	Vectorization	Compilation
Granularity	Batches (1024 rows)	Per-row (tight loop)
SIMD benefit	Natural	Can be added
Compile overhead	None	Medium
Best for	Simple scans, aggregations	Complex expressions, multi-way joins

Modern Systems

Hyper (Tableau): Compiles query to LLVM IR
SingleStore: JIT compilation for complex queries
CockroachDB: Vectorized execution fallback
Umbra: State-of-the-art compiled execution system

Traveler's Notes

Query compilation represents the frontier of database performance optimization. By eliminating interpretation overhead, compiled databases can execute queries tens of times faster than traditional systems. The trade-off is compilation time — but for complex analytical queries, the optimization is well worth it.

Chapter 9: Query Compilation ​

Metadata Card ​

Your Progress ​

The Interpretation Overhead ​

Compiled Execution ​

Approaches ​

Code Generation Example ​

Benefits ​

Compilation vs Vectorization ​

Modern Systems ​

Traveler's Notes ​