Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Methodology

Base Rate Fallacy

Ignoring the prior probability (base rate) when interpreting test results. A 99% accurate test for a 1% prevalence disease gives only 50% positive predictive value.

CoreTier 2Stable~30 min

Why This Matters

The base rate fallacy is the single most common error in interpreting classifier outputs. When a classifier says "positive," people assume this means "probably truly positive." But the positive predictive value depends on the base rate (prevalence) of the condition, not just the test accuracy. In ML, the same error occurs with imbalanced classes: a model with 99% accuracy on a dataset where 99% of examples are negative is useless. It has learned to always predict negative.

Setup

A disease affects 1% of the population. A test for the disease has 99% sensitivity (true positive rate) and 99% specificity (true negative rate). You test positive. What is the probability you actually have the disease?

Most people answer "99%." The correct answer is about 50%.

Definition

Base Rate

The base rate (or prior probability, or prevalence) is the unconditional probability of the condition before any test is administered. In the example above, P(D)=0.01P(D) = 0.01.

Definition

Positive Predictive Value

The positive predictive value is P(D+)P(D \mid +): the probability of having the disease given a positive test result. This is what you actually want to know after testing positive.

Main Theorems

Theorem

Positive Predictive Value via Bayes Theorem

Statement

Given prevalence π=P(D)\pi = P(D), sensitivity s=P(+D)s = P(+ \mid D), and specificity r=P(¬D)r = P(- \mid \neg D), the positive predictive value is:

PPV=P(D+)=sπsπ+(1r)(1π)\text{PPV} = P(D \mid +) = \frac{s \cdot \pi}{s \cdot \pi + (1 - r)(1 - \pi)}

For π=0.01\pi = 0.01, s=0.99s = 0.99, r=0.99r = 0.99:

PPV=0.99×0.010.99×0.01+0.01×0.99=0.00990.0099+0.0099=0.5\text{PPV} = \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.01 \times 0.99} = \frac{0.0099}{0.0099 + 0.0099} = 0.5

Intuition

Out of 10,000 people, 100 have the disease and 9,900 do not. The test correctly identifies 99 of the 100 sick people (sensitivity = 99%). But it also falsely flags 99 of the 9,900 healthy people (false positive rate = 1%). So there are 99 true positives and 99 false positives among the 198 positive results: exactly 50%.

Proof Sketch

Direct application of Bayes theorem:

P(D+)=P(+D)P(D)P(+)=P(+D)P(D)P(+D)P(D)+P(+¬D)P(¬D)P(D \mid +) = \frac{P(+ \mid D) P(D)}{P(+)} = \frac{P(+ \mid D) P(D)}{P(+ \mid D)P(D) + P(+ \mid \neg D)P(\neg D)}

Substituting P(+D)=sP(+ \mid D) = s, P(+¬D)=1rP(+ \mid \neg D) = 1 - r, P(D)=πP(D) = \pi:

P(D+)=sπsπ+(1r)(1π)P(D \mid +) = \frac{s\pi}{s\pi + (1-r)(1-\pi)}

Why It Matters

This formula shows that PPV depends on three quantities, not just test accuracy. When prevalence is low, even highly accurate tests produce many false positives relative to true positives. This is the core reason why screening tests for rare conditions require confirmation with a second, more specific test.

Failure Mode

The formula assumes test performance is constant across the population. In practice, sensitivity and specificity can vary by subgroup (age, genetics, disease severity). The formula also breaks down when tests are applied to selected populations rather than the general population, because the effective prevalence changes.

Connection to ML: Precision and Class Imbalance

In ML terminology, PPV is precision. Sensitivity is recall. The base rate fallacy explains why precision drops when classes are imbalanced:

  • Precision = TP/(TP+FP)TP / (TP + FP): same as PPV
  • Recall = TP/(TP+FN)TP / (TP + FN): same as sensitivity

A classifier with 99% accuracy on a 1% positive rate dataset can achieve this by predicting "negative" for every example. It has 99% accuracy, 0% recall, and undefined (0/0) precision on the positive class. Accuracy alone hides the failure.

Common Confusions

Watch Out

Test accuracy equals probability of disease given positive test

A test that is "99% accurate" does not mean a positive result has a 99% chance of being correct. The 99% refers to P(+D)P(+ \mid D) and P(¬D)P(- \mid \neg D), not to P(D+)P(D \mid +). These are different quantities. The confusion is between P(AB)P(A \mid B) and P(BA)P(B \mid A).

Watch Out

High accuracy means a good classifier

On imbalanced datasets, accuracy is dominated by the majority class. A spam filter with 99.9% accuracy that never flags any email as spam (because only 0.1% of emails are spam) is useless. Use precision, recall, and F1 instead.

Watch Out

Repeated testing fixes the problem

A common suggestion is "just test again." If the second test is independent given disease status, the math does work: a second positive raises the posterior significantly. But in practice, the same test on the same patient often has correlated errors, reducing the benefit of retesting.

Canonical Examples

Example

Disease screening with different prevalences

Fix sensitivity = 99%, specificity = 99%.

PrevalencePPV
50%99%
10%91.7%
1%50%
0.1%9.0%

At 0.1% prevalence, a positive result means only a 9% chance of disease. The same test goes from nearly definitive to nearly useless as prevalence drops.

Exercises

ExerciseCore

Problem

A classifier has 95% recall and 90% specificity on a binary task where 5% of examples are positive. What is the precision?

ExerciseAdvanced

Problem

What specificity is needed to achieve 95% precision when prevalence is 1% and sensitivity is 99%?

References

Canonical:

  • Kahneman, Slovic, Tversky, Judgment Under Uncertainty (1982), Chapter on base rates
  • Gigerenzer, "Calculated Risks" (2002), Chapters 3-4

Current:

  • Saito & Rehmsmeier, "The Precision-Recall Plot Is More Informative than the ROC Plot", PLOS ONE (2015)

  • Hastie, Tibshirani, Friedman, The Elements of Statistical Learning (2009), Chapters 7-8

  • Shalev-Shwartz & Ben-David, Understanding Machine Learning (2014), Chapters 11-14

  • Murphy, Machine Learning: A Probabilistic Perspective (2012), Chapters 5-7

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics