Skip to content

Chapter 8: Query Execution & Optimizer


Metadata Card

AttributeContent
Difficulty(Intermediate-Advanced)
PrerequisitesChapter 2 (Relational Algebra), Chapter 4 (B+ Tree), SQL
KeywordsQuery plan, optimizer, cost estimation, join algorithms, scan methods, cardinality estimation, push-down
Code~150 lines

Your Progress

"Master Chen's strategist plans the optimal route for each query — scan methods, join algorithms, cost estimation."

Query Processing Pipeline

SQL Query → Parser → Rewriter → Optimizer → Executor → Result
  1. Parser: SQL text → AST (Abstract Syntax Tree)
  2. Rewriter: Apply logical transformations (view expansion, predicate push-down)
  3. Optimizer: Find the best execution plan from many equivalent plans
  4. Executor: Execute the chosen plan

Scan Methods

MethodDescriptionBest For
Sequential scanRead all pages sequentiallyLarge tables, no index, wide access patterns
Index scan (B+ Tree)Follow B+ Tree pointersPoint queries, range queries with conditions
Bitmap scanCollect bitmap of matching pages, then fetchMultiple conditions, partial matches

Join Algorithms

AlgorithmDescriptionComplexityBest For
Nested Loop JoinFor each outer row, scan inner tableO(N × M)Small inner table, indexed inner table
Hash JoinBuild hash table on one side, probe with the otherO(N + M) (average)No index, equi-joins, large tables
Sort-Merge JoinSort both sides, then mergeO(N log N + M log M)Sorted inputs, range joins, non-equi joins

Nested Loop Join Variants

# Simple Nested Loop Join
for outer_row in outer_table:
    for inner_row in inner_table:
        if condition:
            emit(outer_row, inner_row)

# Index Nested Loop Join (inner table has index)
for outer_row in outer_table:
    inner_rows = index_lookup(inner_table, outer_row.join_key)
    for inner_row in inner_rows:
        emit(outer_row, inner_row)

# Block Nested Loop Join (page-level)
for outer_block in outer_table_blocks:
    for inner_block in inner_table_blocks:
        for outer_row in outer_block:
            for inner_row in inner_block:
                if condition:
                    emit(outer_row, inner_row)

Cost Estimation

The optimizer estimates the cost of each operation using:

  • Cardinality estimation: Number of rows expected from each operation
  • I/O cost: Number of disk pages read/written
  • CPU cost: Processing time per row

Cardinality estimation uses table statistics (row count, distinct values, histogram, most common values).

Query Optimization Strategies

  • Predicate push-down: Apply filters as early as possible (reduce data volume before joins)
  • Projection push-down: Remove unnecessary columns early
  • Join reordering: Choose optimal join order (smallest result sets first)
  • Materialization: Compute intermediate results to disk vs pipeline to next operator

Volcano Model vs Vectorized

ModelProcessing UnitOverhead
Volcano (Iterator)One row at a timeHigh (function calls)
VectorizedBatch of rowsLower (amortized)
CompiledGenerated codeLowest (no interpretation)

Traveler's Notes

The query optimizer is the "brain" of a database. It takes your declarative SQL and turns it into an efficient execution plan. Understanding how it works helps you write better SQL — and debug slow queries when the optimizer makes suboptimal choices.

Built with VitePress | Software Systems Atlas