Skip to content

Chapter 4: Hash Tables

Vol 2: Algorithm Forest Adventure · Station 4


Metadata Card

AttributeValue
ChapterVol 2 · Chapter 4
Difficulty(Core Challenge)
PrerequisitesArrays, Linked Lists (Chapter 2)
Core SkillsHash function design, collision handling, resizing, consistent hashing
KeywordsHash Table Hash Function Collision Chaining Open Addressing Load Factor Rehash Consistent Hashing

Your Progress

You encounter a strange thicket in the forest — every bush has a lock, and with the right key, the bush instantly pops out what you stored. This is the Hash Forest. Trees here live by hash values, not order.


Breakthrough · Origins

Encounter #1: Hash Function

A Hash Function takes any object and returns a fixed-range integer — the array index.

python
def simple_hash(key, table_size):
    return hash(key) % table_size

Collisions are inevitable (pigeonhole principle). When "apple" and "pen" hash to the same index, we have a Hash Collision.

Encounter #2: Separate Chaining

Each array slot holds a linked list. On collision, append to the list.

python
class ChainingHashTable:
    def __init__(self, capacity=8):
        self._table = [[] for _ in range(capacity)]
        self._capacity = capacity
        self._size = 0

    def _hash(self, key):
        return hash(key) % self._capacity

    def put(self, key, value):
        idx = self._hash(key)
        bucket = self._table[idx]
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket[i] = (key, value)
                return
        bucket.append((key, value))
        self._size += 1

    def get(self, key):
        idx = self._hash(key)
        for k, v in self._table[idx]:
            if k == key:
                return v
        raise KeyError(key)

    def remove(self, key):
        idx = self._hash(key)
        bucket = self._table[idx]
        for i, (k, v) in enumerate(bucket):
            if k == key:
                bucket.pop(i)
                self._size -= 1
                return
        raise KeyError(key)

Encounter #3: Open Addressing (Linear Probing)

On collision, walk forward to find the next empty slot.

python
class OpenAddressingHashTable:
    DELETED = object()

    def __init__(self, capacity=8):
        self._keys = [None] * capacity
        self._vals = [None] * capacity
        self._capacity = capacity
        self._size = 0

    def _hash(self, key):
        return hash(key) % self._capacity

    def put(self, key, value):
        idx = self._hash(key)
        while self._keys[idx] is not None and self._keys[idx] is not key:
            idx = (idx + 1) % self._capacity
        if self._keys[idx] is None:
            self._size += 1
        self._keys[idx] = key
        self._vals[idx] = value

    def get(self, key):
        idx = self._hash(key)
        while self._keys[idx] is not None:
            if self._keys[idx] is key:
                return self._vals[idx]
            idx = (idx + 1) % self._capacity
        raise KeyError(key)

Encounter #4: Load Factor and Rehashing

python
class AutoResizingHashTable(ChainingHashTable):
    LOAD_FACTOR_THRESHOLD = 0.75

    def put(self, key, value):
        super().put(key, value)
        if self._size / self._capacity > self.LOAD_FACTOR_THRESHOLD:
            self._resize(self._capacity * 2)

    def _resize(self, new_cap):
        old = self._table
        self._capacity = new_cap
        self._table = [[] for _ in range(new_cap)]
        self._size = 0
        for bucket in old:
            for k, v in bucket:
                super().put(k, v)

Encounter #5: Consistent Hashing (Preview)

Hash servers and keys onto a ring. Adding/removing a server only affects neighboring keys — no full rehashing.


Common Pitfalls

Two Sum: O(n²) → O(n)

python
def two_sum(nums, target):
    seen = {}
    for i, num in enumerate(nums):
        if target - num in seen:
            return [seen[target - num], i]
        seen[num] = i
    return None

Verification Checklist

  • [ ] Can explain why hash tables achieve O(1) lookup
  • [ ] Can hand-write separate chaining (insert/find/delete)
  • [ ] Can explain linear probing vs quadratic probing
  • [ ] Understands load factor and why 0.75 is standard
  • [ ] Can explain the core problem consistent hashing solves

Traveler's Notes

  1. Hash function → array index. Collisions are inevitable.
  2. Separate Chaining — chain on collision. Java 8+ upgrades to red-black tree after 8 nodes.
  3. Open Addressing — find next slot on collision. Memory-compact, tricky deletion.
  4. Load factor 0.75 — industry consensus for space-time balance.
  5. Consistent hashing — minimizes data migration in distributed caches.

Next Stop Preview

Chapter 5: Binary Trees and BST — "From linear to hierarchical — the first nonlinear structure."

Built with VitePress | Software Systems Atlas