Skip to main content

Foundations

Hypergeometric Distribution

The Hypergeometric distribution is the law of the success count in n draws without replacement from a finite population of N items, K of which are successes. The PMF is the ratio of three binomial coefficients. The mean is nK/N, identical to the Binomial with p = K/N, but the variance carries the finite-population correction (N-n)/(N-1). The Binomial is the n << N limit. Fisher's exact test, capture-recapture, and quality-control acceptance sampling read off the Hypergeometric directly.

ImportantCoreTier 2StableSupporting~35 min
For:StatsGeneral

Plain-Language Definition

Take a finite population of NN items. Mark KK of them as successes. Draw a sample of size nn without replacement and count the number of successes drawn. The probability law of that count is the Hypergeometric distribution.

The plain example is an urn with KK red balls and NKN - K blue balls. Pull out nn balls without replacement; how many are red? Sampling without replacement is what distinguishes the Hypergeometric from the Binomial. The Binomial is the law for the same setup with replacement, or equivalently for an infinite population. As the population grows large relative to the sample, the Hypergeometric converges to the Binomial.

Definition

Definition

Hypergeometric Distribution

A random variable XX has a Hypergeometric distribution with population size NN, success count K{0,1,,N}K \in \{0, 1, \dots, N\}, and sample size n{0,1,,N}n \in \{0, 1, \dots, N\} if its PMF is

P(X=k)=(Kk)(NKnk)(Nn),k{max(0,n(NK)),,min(n,K)}.\mathbb{P}(X = k) = \frac{\binom{K}{k}\binom{N - K}{n - k}}{\binom{N}{n}}, \quad k \in \{\max(0, n - (N - K)), \dots, \min(n, K)\}.

The numerator counts the ways to choose kk successes from the KK available and nkn - k failures from the NKN - K available. The denominator counts the ways to choose any nn items from NN. The ratio is the equally-likely-outcomes probability for that count.

The support boundaries are tight. The lower bound max(0,n(NK))\max(0, n - (N - K)) kicks in when the sample is so large that some successes are forced into it; the upper bound min(n,K)\min(n, K) caps the count at the smaller of the sample size and the available-success count.

Why This Matters

Three places in the canon use the Hypergeometric directly.

  1. Fisher's exact test for 2×22 \times 2 tables. The null distribution of one cell count given fixed margins is exactly Hypergeometric. The conditioning argument that produces it is the cleanest derivation of an exact small-sample test in classical statistics, and it works when the chi-squared approximation breaks down for small cell counts.

  2. Capture-recapture. Tag KK animals, release them, draw nn from the population, and observe kk tagged. The Hypergeometric PMF gives a likelihood for the unknown NN; the maximum-likelihood estimator N^=Kn/k\widehat N = \lfloor Kn / k \rfloor is the Lincoln-Petersen estimator. Same idea works for software-defect estimation in code review.

  3. Acceptance sampling. A lot of NN items contains KK defectives. Draw nn items and accept the lot if no more than cc are defective. The operating characteristic of the inspection plan is a sum of Hypergeometric PMFs in KK.

Mean and Variance

Theorem

Hypergeometric Mean and Variance

Statement

E[X]=nKN,Var(X)=nKNNKNNnN1.\mathbb{E}[X] = n\frac{K}{N}, \qquad \operatorname{Var}(X) = n\frac{K}{N}\frac{N - K}{N}\frac{N - n}{N - 1}.

Intuition

The mean is identical to a Binomial with nn trials and per-trial success probability K/NK/N. The variance differs by the factor (Nn)/(N1)(N - n)/(N - 1), the finite-population correction. As NN \to \infty with K/NpK/N \to p, the correction tends to 1 and the Hypergeometric variance approaches np(1p)np(1 - p), the Binomial variance.

Proof Sketch

Write X=i=1nIiX = \sum_{i=1}^{n} I_i, where IiI_i is the indicator that the ii-th draw is a success. By symmetry P(Ii=1)=K/N\mathbb{P}(I_i = 1) = K/N for every ii, so E[X]=nK/N\mathbb{E}[X] = nK/N by linearity. For the variance, Var(X)=iVar(Ii)+ijCov(Ii,Ij)\operatorname{Var}(X) = \sum_i \operatorname{Var}(I_i) + \sum_{i \neq j} \operatorname{Cov}(I_i, I_j). Each Var(Ii)=(K/N)(1K/N)\operatorname{Var}(I_i) = (K/N)(1 - K/N). For iji \neq j, E[IiIj]=(K/N)(K1)/(N1)\mathbb{E}[I_i I_j] = (K/N) \cdot (K - 1)/(N - 1) by the chain rule on a uniformly sampled pair, giving Cov(Ii,Ij)=(K/N)(NK)/[N2(N1)]\operatorname{Cov}(I_i, I_j) = -(K/N)(N - K) / [N^2 (N - 1)]. Summing nn variances and n(n1)n(n - 1) covariances and collecting terms gives the stated variance.

Why It Matters

The mean alone cannot tell the Hypergeometric apart from the Binomial. The finite-population correction in the variance is the diagnostic. In any setting where the sample is a meaningful fraction of the population (sampling 10 percent or more from a finite population), the Binomial variance overstates the true variance, sometimes substantially.

Failure Mode

Software libraries differ on parameter order. Some take (N,K,n)(N, K, n), others (K,NK,n)(K, N - K, n), others (n,K,N)(n, K, N). Read the docstring before plugging in numbers.

Binomial Limit

Theorem

Hypergeometric Converges to Binomial as N grows

Statement

For fixed nn and kk, P(Hypergeometric(N,K,n)=k)(nk)pk(1p)nk\mathbb{P}(\operatorname{Hypergeometric}(N, K, n) = k) \to \binom{n}{k} p^k (1 - p)^{n - k} as NN \to \infty with K/NpK/N \to p.

Intuition

When the population is much larger than the sample, removing a drawn item barely changes the success fraction in the remaining items. Sampling without replacement becomes indistinguishable from sampling with replacement, and the Hypergeometric reduces to the Binomial.

Proof Sketch

Expand the Hypergeometric PMF using (Kk)=K(K1)(Kk+1)/k!\binom{K}{k} = K(K-1)\cdots(K-k+1)/k! and similarly for the other binomial coefficients. The numerator is a polynomial of total degree nn in NN, and the denominator (Nn)\binom{N}{n} is also Θ(Nn)\Theta(N^n). Match leading coefficients: the kk factors from KK and the nkn - k factors from NKN - K give Kk(NK)nkK^k (N - K)^{n - k} to leading order, while the denominator gives Nn/n!N^n / n! \cdot a factor close to 11 for large NN. Reassembling and dividing K/NpK/N \to p produces the Binomial PMF.

Why It Matters

The rule of thumb in textbooks is to use the Binomial approximation when n0.05Nn \leq 0.05 N, i.e. the sample is at most five percent of the population. Below that threshold the Binomial PMF and the Hypergeometric PMF agree to two decimal places for moderate kk. Above it, the finite-population correction matters and the exact Hypergeometric should be used.

Failure Mode

The convergence is pointwise in kk, not uniform in the extreme tails. For very small P(X=k)\mathbb{P}(X = k) at the support boundaries, the relative error from the Binomial approximation can be substantial even when n/Nn / N is small.

Worked Example: Acceptance Sampling

A shipment contains N=100N = 100 widgets, of which K=8K = 8 are defective. An inspector draws n=20n = 20 at random without replacement.

What is the probability of seeing exactly k=2k = 2 defectives?

P(X=2)=(82)(9218)(10020).\mathbb{P}(X = 2) = \frac{\binom{8}{2}\binom{92}{18}}{\binom{100}{20}}.

Compute the three binomial coefficients (in practice via log-gamma to avoid overflow). The numerical value is approximately 0.30410.3041.

What is the probability of seeing no defectives at all? That is the acceptance probability for a lot-tolerance plan with c=0c = 0:

P(X=0)=(9220)(10020)=i=01992i100i0.1825.\mathbb{P}(X = 0) = \frac{\binom{92}{20}}{\binom{100}{20}} = \prod_{i = 0}^{19} \frac{92 - i}{100 - i} \approx 0.1825.

The Binomial approximation with p=0.08p = 0.08, n=20n = 20 gives (10.08)200.1887(1 - 0.08)^{20} \approx 0.1887, which overestimates the acceptance probability because it ignores the finite-population correction.

The mean and standard deviation: E[X]=208/100=1.6\mathbb{E}[X] = 20 \cdot 8 / 100 = 1.6, Var(X)=1.60.92(10020)/99=1.190\operatorname{Var}(X) = 1.6 \cdot 0.92 \cdot (100 - 20)/99 = 1.190, SD(X)1.091\operatorname{SD}(X) \approx 1.091. The Binomial would give variance 1.60.92=1.4721.6 \cdot 0.92 = 1.472, about 24 percent too large.

Comparison to Closely Related Distributions

Watch Out

Hypergeometric and Binomial have the same mean but different variances

The two distributions agree on the mean (E[X]=nK/N\mathbb{E}[X] = nK/N versus E[X]=np\mathbb{E}[X] = np with p=K/Np = K/N). They differ in the variance by the finite-population correction (Nn)/(N1)(N - n)/(N - 1), which equals 11 when n=1n = 1 and decreases monotonically to 00 when n=Nn = N. Reporting the Binomial standard error on data from a finite-population without-replacement design will overstate the uncertainty.

Watch Out

The Hypergeometric is not symmetric in K and n

A Hypergeometric(N,K,n)(N, K, n) has the same distribution as a Hypergeometric(N,n,K)(N, n, K) by the symmetry of "which side of the table is the sample". The PMF is symmetric in the roles of "successes" and "draws". This is a useful computational trick when KnK \ll n or nKn \ll K. It is also a frequent source of confusion when reading code that swaps the two.

Watch Out

Conditional on the margins, the Hypergeometric is exact, not approximate

In a 2×22 \times 2 contingency table with fixed margins, the conditional distribution of the upper-left cell under the null of independence is exactly Hypergeometric. Fisher's exact test reports a pp-value computed from this exact distribution. The chi-squared approximation to the same test statistic uses a continuous distribution and is approximate; the two answers can differ for small cell counts.

Hypergeometric vs Binomial vs Negative Hypergeometric

The Hypergeometric, the Binomial, and the Negative Hypergeometric all answer questions about counting successes in a sequence of draws but differ in what is fixed and what is random.

  • Binomial. nn independent trials with constant success probability pp. Random variable: the number of successes. Sampling with replacement, or equivalently from an infinite population.
  • Hypergeometric. nn draws without replacement from a finite population with KK successes and NKN - K failures. Random variable: the number of successes. Sampling without replacement from a finite population.
  • Negative Hypergeometric. Draw without replacement until rr failures occur. Random variable: the number of successes before the rr-th failure. Less commonly encountered, but appears in stopping-time and waiting-time problems analogous to the Negative Binomial under sampling without replacement.

The relationships are clean: the Hypergeometric converges to the Binomial as NN \to \infty with K/NpK/N \to p, and the Negative Hypergeometric stands to the Hypergeometric as the Negative Binomial stands to the Binomial.

Capture-Recapture

A wildlife biologist tags K=100K = 100 fish in a lake, releases them, waits for mixing, and then draws n=60n = 60 fish, of which k=12k = 12 are tagged. The likelihood of the unknown population size NN is

L(N)=(10012)(N10048)(N60).L(N) = \frac{\binom{100}{12}\binom{N - 100}{48}}{\binom{N}{60}}.

Differentiating with respect to NN and finding the integer maximum gives the Lincoln-Petersen estimator N^=Kn/k=10060/12=500\widehat N = \lfloor Kn / k \rfloor = \lfloor 100 \cdot 60 / 12 \rfloor = 500 fish.

The Lincoln-Petersen estimator assumes a closed population (no births, deaths, immigration, or emigration during the experiment), random mixing of tagged fish, and no tag loss. Violations are common in field work, and modified estimators (Schnabel for multiple recapture occasions, Jolly-Seber for open populations) extend the same Hypergeometric likelihood approach.

Exercises

ExerciseCore

Problem

A deck of 52 cards contains 4 aces. Draw 5 cards without replacement. Find the probability of getting exactly 2 aces and the probability of getting at least 1 ace.

ExerciseCore

Problem

A factory ships boxes of N=50N = 50 light bulbs. The quality-control protocol draws n=5n = 5 bulbs without replacement and rejects the box if any are defective. If the box contains K=3K = 3 defectives, what is the probability the box is accepted?

ExerciseCore

Problem

A lottery draws 6 numbers without replacement from 49. A ticket selects 6 numbers. Find the probability of matching exactly 4 of the 6 drawn numbers.

ExerciseAdvanced

Problem

Show that for fixed nn, the variance ratio VarHyper(X)/VarBin(X)\operatorname{Var}_{\text{Hyper}}(X) / \operatorname{Var}_{\text{Bin}}(X) equals (Nn)/(N1)(N - n)/(N - 1), and interpret the two extremes n=1n = 1 and n=Nn = N.

ExerciseAdvanced

Problem

A 2×22 \times 2 contingency table records the outcomes of 14 patients on two treatments. Treatment A: 6 of 8 successes. Treatment B: 1 of 6 successes. Compute Fisher's exact one-sided pp-value for the null that treatments are equally effective, conditioning on the observed margins.

Sampling-Distribution Connections

The Hypergeometric is the without-replacement analogue of the Binomial and sits inside a small lattice of related distributions:

  • The Binomial is the with-replacement / infinite-population limit.
  • The Multivariate Hypergeometric generalizes to populations with more than two categories: N=cKcN = \sum_c K_c items in categories c=1,,Cc = 1, \dots, C, sample nn, count the categories.
  • The Negative Hypergeometric changes the stopping rule from "fixed nn" to "until rr failures occur".
  • The Poisson is the rare-event limit of the Binomial and, transitively, a far-field limit of the Hypergeometric when both NN and KK grow large with K/N0K/N \to 0 and nK/NλnK/N \to \lambda.

References

  • Casella, G., and Berger, R. L. (2002). Statistical Inference, 2nd ed., Duxbury. Chapter 3.2 covers discrete distributions including the Hypergeometric, with the Binomial limit worked through.
  • Blitzstein, J. K., and Hwang, J. (2019). Introduction to Probability, 2nd ed., Chapman and Hall / CRC. Chapter 3 has a chess-board treatment of the Hypergeometric and a clean derivation of the Lincoln-Petersen estimator.
  • Lehmann, E. L., and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed., Springer. Section 4.6 covers Fisher's exact test and the role of the Hypergeometric null distribution in conditional inference.

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

2

Derived topics

0

No published topic currently declares this as a prerequisite.