Chapter 8: Query Execution & Optimizer
Metadata Card
| Attribute | Content |
|---|---|
| Difficulty | (Intermediate-Advanced) |
| Prerequisites | Chapter 2 (Relational Algebra), Chapter 4 (B+ Tree), SQL |
| Keywords | Query plan, optimizer, cost estimation, join algorithms, scan methods, cardinality estimation, push-down |
| Code | ~150 lines |
Your Progress
"Master Chen's strategist plans the optimal route for each query — scan methods, join algorithms, cost estimation."
Query Processing Pipeline
SQL Query → Parser → Rewriter → Optimizer → Executor → Result- Parser: SQL text → AST (Abstract Syntax Tree)
- Rewriter: Apply logical transformations (view expansion, predicate push-down)
- Optimizer: Find the best execution plan from many equivalent plans
- Executor: Execute the chosen plan
Scan Methods
| Method | Description | Best For |
|---|---|---|
| Sequential scan | Read all pages sequentially | Large tables, no index, wide access patterns |
| Index scan (B+ Tree) | Follow B+ Tree pointers | Point queries, range queries with conditions |
| Bitmap scan | Collect bitmap of matching pages, then fetch | Multiple conditions, partial matches |
Join Algorithms
| Algorithm | Description | Complexity | Best For |
|---|---|---|---|
| Nested Loop Join | For each outer row, scan inner table | O(N × M) | Small inner table, indexed inner table |
| Hash Join | Build hash table on one side, probe with the other | O(N + M) (average) | No index, equi-joins, large tables |
| Sort-Merge Join | Sort both sides, then merge | O(N log N + M log M) | Sorted inputs, range joins, non-equi joins |
Nested Loop Join Variants
# Simple Nested Loop Join
for outer_row in outer_table:
for inner_row in inner_table:
if condition:
emit(outer_row, inner_row)
# Index Nested Loop Join (inner table has index)
for outer_row in outer_table:
inner_rows = index_lookup(inner_table, outer_row.join_key)
for inner_row in inner_rows:
emit(outer_row, inner_row)
# Block Nested Loop Join (page-level)
for outer_block in outer_table_blocks:
for inner_block in inner_table_blocks:
for outer_row in outer_block:
for inner_row in inner_block:
if condition:
emit(outer_row, inner_row)Cost Estimation
The optimizer estimates the cost of each operation using:
- Cardinality estimation: Number of rows expected from each operation
- I/O cost: Number of disk pages read/written
- CPU cost: Processing time per row
Cardinality estimation uses table statistics (row count, distinct values, histogram, most common values).
Query Optimization Strategies
- Predicate push-down: Apply filters as early as possible (reduce data volume before joins)
- Projection push-down: Remove unnecessary columns early
- Join reordering: Choose optimal join order (smallest result sets first)
- Materialization: Compute intermediate results to disk vs pipeline to next operator
Volcano Model vs Vectorized
| Model | Processing Unit | Overhead |
|---|---|---|
| Volcano (Iterator) | One row at a time | High (function calls) |
| Vectorized | Batch of rows | Lower (amortized) |
| Compiled | Generated code | Lowest (no interpretation) |
Traveler's Notes
The query optimizer is the "brain" of a database. It takes your declarative SQL and turns it into an efficient execution plan. Understanding how it works helps you write better SQL — and debug slow queries when the optimizer makes suboptimal choices.