Base Rate Fallacy

Sneiderman, Robby

Methodology

Base Rate Fallacy

Ignoring the prior probability (base rate) when interpreting test results. A 99% accurate test for a 1% prevalence disease gives only 50% positive predictive value.

CoreTier 2StableInsight~30 min

Prerequisites

Common Probability Distributions Birthday Paradox Confusion Matrices and Classification Metrics Monty Hall Problem

Start 8-question practice · 5 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

methodology | layer 1 | tier 2. This page has 5 direct prerequisites and 1 published dependent.

Open Atlas Prerequisites Leads to

What next

Confusion Matrices and Classification Metrics

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

The base rate fallacy is the single most common error in interpreting classifier outputs. When a classifier says "positive," people assume this means "probably truly positive." But the positive predictive value depends on the base rate (prevalence) of the condition, not just the test accuracy. In ML, the same error occurs with imbalanced classes: a model with 99% accuracy on a dataset where 99% of examples are negative is useless. It has learned to always predict negative.

Setup

A disease affects 1% of the population. A test for the disease has 99% sensitivity (true positive rate) and 99% specificity (true negative rate). You test positive. What is the probability you actually have the disease?

Most people answer "99%." The correct answer is about 50%.

Definition

Base Rate $P (D)$

The base rate (or prior probability, or prevalence) is the unconditional probability of the condition before any test is administered. In the example above, $P(D) = 0.01$ .

Definition

Positive Predictive Value $P P V$

The positive predictive value is $P(D \mid +)$ : the probability of having the disease given a positive test result. This is what you actually want to know after testing positive.

Main Theorems

Theorem

Positive Predictive Value via Bayes Theorem

Statement

Given prevalence $\pi = P(D)$ , sensitivity $s = P(+ \mid D)$ , and specificity $r = P(- \mid \neg D)$ , the positive predictive value is:

$\text{PPV} = P(D \mid +) = \frac{s \cdot \pi}{s \cdot \pi + (1 - r)(1 - \pi)}$

For $\pi = 0.01$ , $s = 0.99$ , $r = 0.99$ :

$\text{PPV} = \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.01 \times 0.99} = \frac{0.0099}{0.0099 + 0.0099} = 0.5$

Intuition

Out of 10,000 people, 100 have the disease and 9,900 do not. The test correctly identifies 99 of the 100 sick people (sensitivity = 99%). But it also falsely flags 99 of the 9,900 healthy people (false positive rate = 1%). So there are 99 true positives and 99 false positives among the 198 positive results: exactly 50%.

Proof Sketch

Direct application of Bayes theorem:

$P(D \mid +) = \frac{P(+ \mid D) P(D)}{P(+)} = \frac{P(+ \mid D) P(D)}{P(+ \mid D)P(D) + P(+ \mid \neg D)P(\neg D)}$

Substituting $P(+ \mid D) = s$ , $P(+ \mid \neg D) = 1 - r$ , $P(D) = \pi$ :

$P(D \mid +) = \frac{s\pi}{s\pi + (1-r)(1-\pi)}$

Why It Matters

This formula shows that PPV depends on three quantities, not just test accuracy. When prevalence is low, even highly accurate tests produce many false positives relative to true positives. This is the core reason why screening tests for rare conditions require confirmation with a second, more specific test.

Failure Mode

The formula assumes test performance is constant across the population. In practice, sensitivity and specificity can vary by subgroup (age, genetics, disease severity). The formula also breaks down when tests are applied to selected populations rather than the general population, because the effective prevalence changes.

report a correction →

Connection to ML: Precision and Class Imbalance

In ML terminology, PPV is precision. Sensitivity is recall. The base rate fallacy explains why precision drops when classes are imbalanced:

Precision = $TP / (TP + FP)$ : same as PPV
Recall = $TP / (TP + FN)$ : same as sensitivity

A classifier with 99% accuracy on a 1% positive rate dataset can achieve this by predicting "negative" for every example. It has 99% accuracy, 0% recall, and undefined (0/0) precision on the positive class. Accuracy alone hides the failure.

Common Confusions

Watch Out

Test accuracy equals probability of disease given positive test

A test that is "99% accurate" does not mean a positive result has a 99% chance of being correct. The 99% refers to $P(+ \mid D)$ and $P(- \mid \neg D)$ , not to $P(D \mid +)$ . These are different quantities. The confusion is between $P(A \mid B)$ and $P(B \mid A)$ .

Watch Out

High accuracy means a good classifier

On imbalanced datasets, accuracy is dominated by the majority class. A spam filter with 99.9% accuracy that never flags any email as spam (because only 0.1% of emails are spam) is useless. Use precision, recall, and F1 instead.

Watch Out

Repeated testing fixes the problem

A common suggestion is "just test again." If the second test is independent given disease status, the math does work: a second positive raises the posterior significantly. But in practice, the same test on the same patient often has correlated errors, reducing the benefit of retesting.

Canonical Examples

Example

Disease screening with different prevalences

Fix sensitivity = 99%, specificity = 99%.

Prevalence	PPV
50%	99%
10%	91.7%
1%	50%
0.1%	9.0%

At 0.1% prevalence, a positive result means only a 9% chance of disease. The same test goes from nearly definitive to nearly useless as prevalence drops.

Exercises

ExerciseCore

Problem

A classifier has 95% recall and 90% specificity on a binary task where 5% of examples are positive. What is the precision?

ExerciseAdvanced

Problem

What specificity is needed to achieve 95% precision when prevalence is 1% and sensitivity is 99%?

References

Canonical:

Kahneman, Slovic, Tversky, Judgment Under Uncertainty (1982), Chapter on base rates
Gigerenzer, "Calculated Risks" (2002), Chapters 3-4

Current:

Saito & Rehmsmeier, "The Precision-Recall Plot Is More Informative than the ROC Plot", PLOS ONE (2015)
Murphy, Machine Learning: A Probabilistic Perspective (2012), Chapters 5-7

Next Topics

Confusion matrices and classification metrics: the full framework for evaluating classifiers
Simpson's paradox: another case where aggregation produces misleading results

Last reviewed: April 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

5

Common Probability Distributionslayer 0A · tier 1
Confusion Matrices and Classification Metricslayer 1 · tier 1
Birthday Paradoxlayer 0A · tier 2
Monty Hall Problemlayer 0A · tier 2
Simpson's Paradoxlayer 1 · tier 2

Derived topics

1

Statistical Paradoxes Collectionlayer 2 · tier 3

Graph-backed continuations

Statistical Paradoxes Collection