Metadata Card
- Prerequisites: Chapter 6 — Combinatorics
- Estimated time: 50 minutes
- Core difficulty: Intermediate
- Completion marker: Understand probability space, conditional probability, Bayes' theorem, and grasp the concepts of random variables and expectation
Your Progress
In the third room of the Combinatorics Tower, dice and spinning wheels adorn the walls. The Librarian takes a coin from his pocket and tosses it into the air — "Heads or tails? You can't be certain. But what about ten thousand tosses?"
Your Task
Probability theory is the mathematical framework for understanding uncertainty in computer science. Why is cache hit rate 90% and not 100%? In A/B testing, what is the probability that the observed difference is not just coincidence? How do you interpret the confidence of a machine learning model's output? — You can't escape probability.
Chapter Layers
- Required reading: Probability space, conditional probability, Bayes' theorem, random variables
- Optional reading: Law of large numbers, central limit theorem
Breakthrough · Origin Story
During online monitoring, you notice that a certain API's response time is under 200ms in 95% of cases, but occasionally 5% of requests take over 2 seconds. Should you be worried? On the frequency dimension, "2 seconds" is an order of magnitude worse than "200ms", but on the probability dimension, it only affects 5% of requests. Probability gives you a language to measure "uncertainty," allowing you to quantify "occasionally slow."
Three elements of a probability space:
- Sample space Ω: The set of all possible outcomes. Coin toss Ω = {Heads, Tails}. Die roll Ω = {1, 2, 3, 4, 5, 6}.
- Event: A subset of Ω. "Even number" = {2, 4, 6}.
- Probability P: Assigns a number between 0 and 1 to each event, satisfying P(Ω) = 1, P(∅) = 0, and additivity for mutually exclusive events.
P(die roll is even) = 3/6 = 1/2.
Conditional probability: Reevaluating probability given some known information.
P(A|B) = P(A ∩ B) / P(B)
"Given that it's cloudy today, what is the probability of rain?" — conditional probability is this formula.
Bayes' theorem is the inverse of conditional probability:
P(A|B) = P(B|A) × P(A) / P(B)
It allows you to infer "cause" from "effect." A classic application: disease testing.
Suppose the incidence of a rare disease is P(disease) = 0.1%. Test accuracy: 99% positive if you have the disease, 2% false positive if you don't. If you test positive, what is the probability you actually have the disease?
You've seen a similar pattern in unit test coverage reports. A test case passing 99% of the time does not mean the tested code is 99% correct — because the bug occurrence rate itself is very low (like a rare disease). Bayes' formula tells you: under a low base rate, a 'positive' does not necessarily mean 'there is a problem.' This is the core intuition behind spam filtering and anomaly detection.
P(disease|positive) = P(positive|disease) × P(disease) / P(positive)
= 0.99 × 0.001 / (0.99 × 0.001 + 0.02 × 0.999)
≈ 0.047Only 4.7%. This counterintuitive result — positive but very likely not sick — is the "false positive" inflation effect in low-incidence scenarios. This principle appears repeatedly in spam filtering, anomaly detection, and ML classifiers.
Random variable: Assigns a numeric value to each outcome in the sample space. It's not a "variable" but a function from Ω to the real numbers.
- Coin toss: X(heads) = 1, X(tails) = 0
- Die roll: X(rolls a 6) = 1 (hit), otherwise = 0
Expectation E[X] is the probability-weighted average of a random variable. Expectation of a fair coin = 1 × 0.5 + 0 × 0.5 = 0.5. Expectation of a single die roll = (1+2+3+4+5+6) / 6 = 3.5.
Expectation often appears in engineering as "average latency." But note, averages can hide long-tail distributions. Your API's P99 might be 10 times the average — which is why operations teams use P99 instead of the mean.
Law of Large Numbers: The more trials you run, the closer the sample mean gets to the expected value. 10 coin tosses might yield 7 heads, but 10,000 tosses will be very close to 5,000 heads.
Common Pitfalls
- Ignoring prior probability. In Bayes' theorem, P(A) is the "prior." Many people only look at the likelihood P(B|A) and jump to conclusions.
- Confusing P(A|B) and P(B|A). "If you have a fever, you probably have a cold" ≠ "If you have a cold, you probably have a fever" — your AI model outputting "80% probability it's a cat" ≠ "80% of cat images are correctly classified."
- Expectation does not represent typical values. The expectation of a single die roll is 3.5, but you can never actually roll a 3.5. An average request latency of 200ms does not mean most requests are close to 200ms.
Challenge Questions
- A bag contains 3 red balls and 2 blue balls. Draw one without replacement, then a second. What is the probability the second ball is red?
- Write a simulation program: toss a coin 100 times, record the proportion of heads, and observe how this proportion approaches 0.5 as the number of tosses increases.
- In spam classification, the word "free" appears in 40% of spam emails and 1% of normal emails. Spam constitutes 20% of all emails. Use Bayes' theorem to calculate the probability that an email containing "free" is spam.
Traveler's Notes
Probability is a "measure of uncertainty." Conditional probability lets you update judgments using new information; Bayes' theorem lets you infer causes from effects; expectation tells you "on average" — but not the whole story.
→ Next stop: Some problems require integers and modular arithmetic. Next, we enter number theory — from clock arithmetic to RSA encryption.