Chapter 8: Query Execution & Optimizer

Metadata Card

Attribute	Content
Difficulty	(Intermediate-Advanced)
Prerequisites	Chapter 2 (Relational Algebra), Chapter 4 (B+ Tree), SQL
Keywords	Query plan, optimizer, cost estimation, join algorithms, scan methods, cardinality estimation, push-down
Code	~150 lines

Your Progress

"Master Chen's strategist plans the optimal route for each query — scan methods, join algorithms, cost estimation."

Query Processing Pipeline

SQL Query → Parser → Rewriter → Optimizer → Executor → Result

Parser: SQL text → AST (Abstract Syntax Tree)
Rewriter: Apply logical transformations (view expansion, predicate push-down)
Optimizer: Find the best execution plan from many equivalent plans
Executor: Execute the chosen plan

Scan Methods

Method	Description	Best For
Sequential scan	Read all pages sequentially	Large tables, no index, wide access patterns
Index scan (B+ Tree)	Follow B+ Tree pointers	Point queries, range queries with conditions
Bitmap scan	Collect bitmap of matching pages, then fetch	Multiple conditions, partial matches

Join Algorithms

Algorithm	Description	Complexity	Best For
Nested Loop Join	For each outer row, scan inner table	O(N × M)	Small inner table, indexed inner table
Hash Join	Build hash table on one side, probe with the other	O(N + M) (average)	No index, equi-joins, large tables
Sort-Merge Join	Sort both sides, then merge	O(N log N + M log M)	Sorted inputs, range joins, non-equi joins

Nested Loop Join Variants

# Simple Nested Loop Join
for outer_row in outer_table:
    for inner_row in inner_table:
        if condition:
            emit(outer_row, inner_row)

# Index Nested Loop Join (inner table has index)
for outer_row in outer_table:
    inner_rows = index_lookup(inner_table, outer_row.join_key)
    for inner_row in inner_rows:
        emit(outer_row, inner_row)

# Block Nested Loop Join (page-level)
for outer_block in outer_table_blocks:
    for inner_block in inner_table_blocks:
        for outer_row in outer_block:
            for inner_row in inner_block:
                if condition:
                    emit(outer_row, inner_row)

Cost Estimation

The optimizer estimates the cost of each operation using:

Cardinality estimation: Number of rows expected from each operation
I/O cost: Number of disk pages read/written
CPU cost: Processing time per row

Cardinality estimation uses table statistics (row count, distinct values, histogram, most common values).

Query Optimization Strategies

Predicate push-down: Apply filters as early as possible (reduce data volume before joins)
Projection push-down: Remove unnecessary columns early
Join reordering: Choose optimal join order (smallest result sets first)
Materialization: Compute intermediate results to disk vs pipeline to next operator

Volcano Model vs Vectorized

Model	Processing Unit	Overhead
Volcano (Iterator)	One row at a time	High (function calls)
Vectorized	Batch of rows	Lower (amortized)
Compiled	Generated code	Lowest (no interpretation)

Traveler's Notes

The query optimizer is the "brain" of a database. It takes your declarative SQL and turns it into an efficient execution plan. Understanding how it works helps you write better SQL — and debug slow queries when the optimizer makes suboptimal choices.

Chapter 8: Query Execution & Optimizer ​

Metadata Card ​

Your Progress ​

Query Processing Pipeline ​

Scan Methods ​

Join Algorithms ​

Nested Loop Join Variants ​

Cost Estimation ​

Query Optimization Strategies ​

Volcano Model vs Vectorized ​

Traveler's Notes ​