Skip to main content

Statistical Estimation

Chi-Squared Distribution and Tests

The Chi-squared distribution as sum of squared standard Normals and as the sampling distribution of the scaled sample variance, plus the two Pearson Chi-squared tests: goodness of fit for cell counts and independence in contingency tables.

ImportantCoreTier 1StableCore spine~60 min
For:StatsActuarialGeneral

Why This Matters

The Chi-squared distribution serves two distinct roles. As a sampling distribution, it is the law of the sum of kk independent squared standard Normals, equivalently the Gamma with shape k/2k/2 and rate 1/21/2. As a test statistic distribution, it is the asymptotic null distribution of the Pearson Chi-squared statistic for cell counts and contingency tables.

Both uses rely on the same fact: a quadratic form in Normal random variables, normalized appropriately, is Chi-squared. For sample variance, the relevant quadratic form is (XiXˉ)2\sum(X_i - \bar X)^2. For Pearson Chi-squared, the relevant quadratic form is (OiEi)2/Ei\sum(O_i - E_i)^2/E_i, which is asymptotically a quadratic form in standardized Normal summands by the central limit theorem applied to multinomial cell counts. This page derives both: the sampling distribution for the Normal model and the asymptotic null distribution for the multinomial.

The Chi-Squared Distribution

Definition

Chi-Squared Distribution

A random variable XX has a Chi-squared distribution with kk degrees of freedom, k{1,2,}k\in\{1,2,\dots\}, if its density is

fX(x)=12k/2Γ(k/2)xk/21ex/2,x>0.f_X(x) = \frac{1}{2^{k/2}\Gamma(k/2)}x^{k/2-1}e^{-x/2},\qquad x > 0.

Equivalently, X=i=1kZi2X = \sum_{i=1}^k Z_i^2 where Z1,,ZkZ_1,\dots,Z_k are i.i.d. standard Normal. The mean is kk and the variance is 2k2k.

In the Gamma family, χk2=Gamma(k/2,1/2)\chi^2_k = \operatorname{Gamma}(k/2, 1/2). The Chi-squared is the half-integer-shape slice of the Gamma; every Chi-squared identity follows from a Gamma identity.

Theorem

Sum of Squared Normals

Statement

If Z1,,ZkZ_1,\dots,Z_k are i.i.d. N(0,1)\mathcal{N}(0,1), then i=1kZi2χk2.\sum_{i=1}^k Z_i^2 \sim \chi^2_k. For aRa\in\mathbb{R}, aZ1aZ_1 is Normal with variance a2a^2, so (aZ1)2/a2=Z12χ12(aZ_1)^2/a^2 = Z_1^2\sim\chi^2_1; summing kk such terms gives the result.

Intuition

Squaring a standard Normal kills its sign and gives a positive random variable with density proportional to z1ez/2/zz^{-1}e^{-z/2}/\sqrt z, which is the χ12\chi^2_1 density. Summing kk independent squared Normals adds the shape parameter (Gamma additivity).

Proof Sketch

Compute the density of Y=Z12Y = Z_1^2 for Z1N(0,1)Z_1\sim\mathcal{N}(0,1) by the change-of-variables formula: for y>0y > 0, fY(y)=(φ(y)+φ(y))/(2y)=φ(y)/y=ey/2/2πyf_Y(y) = (\varphi(\sqrt y) + \varphi(-\sqrt y))/(2\sqrt y) = \varphi(\sqrt y)/\sqrt y = e^{-y/2}/\sqrt{2\pi y}. This matches the χ12\chi^2_1 density. Sum over kk independent copies and apply Gamma additivity with common rate 1/21/2.

Why It Matters

Every classical sampling distribution involving sums of squared deviations from a Normal model is Chi-squared. The sample variance of an i.i.d. Normal sample has (n1)S2/σ2χn12(n-1)S^2/\sigma^2\sim\chi^2_{n-1}. The residual sum of squares in linear regression with Normal errors is a Chi-squared with degrees of freedom equal to nn minus the number of regression coefficients.

Failure Mode

The result requires independence and unit-variance Normality. For non-Normal samples, Zi2\sum Z_i^2 is not Chi-squared; finite-sample tests that assume Chi-squared on quadratic forms of non-Normal data are misspecified. The asymptotic version (Pearson Chi-squared below) makes a weaker assumption and recovers Chi-squared in the limit.

Sample Variance as Chi-Squared

Theorem

Sample Variance Distribution for Normal Data

Statement

Let X1,,XnX_1,\dots,X_n be i.i.d. N(μ,σ2)\mathcal{N}(\mu,\sigma^2) and let S2=(1/(n1))(XiXˉn)2S^2 = (1/(n-1))\sum(X_i-\bar X_n)^2 be the unbiased sample variance. Then (n1)S2σ2χn12,\frac{(n-1)S^2}{\sigma^2}\sim\chi^2_{n-1}, and the sample mean Xˉn\bar X_n is independent of S2S^2.

Intuition

The sample mean is the projection of the sample vector onto the all-ones direction. The deviations XiXˉnX_i - \bar X_n live in the orthogonal hyperplane, a space of dimension n1n - 1. Their squared norm, scaled by σ2\sigma^2, is the squared length of an (n1)(n-1)-dimensional standard Normal vector, which is χn12\chi^2_{n-1}. The independence of mean and variance is the orthogonality.

Proof Sketch

This is the orthogonal-decomposition argument from normal distribution. Apply an orthogonal change of basis with first axis along the all-ones direction. The transformed sample is standard Normal in all nn coordinates; the first coordinate captures Xˉn\bar X_n and the remaining n1n - 1 coordinates capture the deviations. The squared norm of n1n - 1 independent standard Normals is χn12\chi^2_{n-1}, and the first coordinate is independent of the rest. Multiplying back by σ2\sigma^2 gives the stated scaling.

Why It Matters

This is the building block of the Student-t and F sampling distributions. The t-statistic (Xˉnμ)/(S/n)(\bar X_n - \mu)/(S/\sqrt n) is a ratio of a Normal and a root-Chi-squared (over its degrees of freedom), so it is exactly tn1t_{n-1}. The F statistic in F distribution and ANOVA is a ratio of two scaled sample variances, so it is exactly Fd1,d2F_{d_1,d_2}.

Failure Mode

The exact Chi-squared distribution of S2S^2 and the independence of Xˉn\bar X_n from S2S^2 are special properties of the Normal distribution. For non-Normal i.i.d. samples, S2S^2 is asymptotically normal with variance involving the fourth central moment, and is only asymptotically uncorrelated with Xˉn\bar X_n, not independent in finite samples. The exact t-distribution argument therefore fails outside the Normal model.

Pearson Chi-Squared Goodness of Fit

Theorem

Pearson Chi-Squared Goodness of Fit

Statement

Let O1,,OkO_1,\dots,O_k be the observed counts of nn i.i.d. observations falling into kk cells, and let Ei=npiE_i = np_i be the expected counts under the null hypothesis p1,,pkp_1,\dots,p_k. Define the Pearson statistic X2=i=1k(OiEi)2Ei.X^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}. Under the null, X2dχk12X^2\to_d\chi^2_{k-1} as nn\to\infty. The test rejects the null at level α\alpha when X2X^2 exceeds the 1α1-\alpha quantile of χk12\chi^2_{k-1}.

Intuition

The observed counts (O1,,Ok)(O_1,\dots,O_k) are multinomial under the null. The standardized cell residuals (OiEi)/Ei(O_i - E_i)/\sqrt{E_i} are asymptotically Normal with covariance matrix IppI - \sqrt p \sqrt p^\top (a projection onto the orthogonal complement of p=(p1,,pk)\sqrt p = (\sqrt{p_1},\dots,\sqrt{p_k})). The Pearson statistic is the squared Euclidean norm of this random vector, which is the squared length of a standard Normal projected onto a (k1)(k-1)-dimensional subspace, hence χk12\chi^2_{k-1}.

Proof Sketch

The vector of cell counts (O1,,Ok)(O_1,\dots,O_k) is multinomial. By the multivariate central limit theorem, n(Oinpi)i=1kdN(0,Σ)\sqrt n\left(\frac{O_i}{n} - p_i\right)_{i=1}^k\to_d\mathcal{N}(\mathbf 0, \Sigma) where Σij=pi(δijpj)\Sigma_{ij} = p_i(\delta_{ij} - p_j). Standardize component-wise: Zi=(Oinpi)/npiZ_i = (O_i - np_i)/\sqrt{np_i}. The covariance of (Z1,,Zk)(Z_1,\dots,Z_k) converges to IppI - \sqrt p\sqrt p^\top, a rank k1k - 1 projection matrix. The sum of squares Zi2=X2\sum Z_i^2 = X^2 is the squared norm of a multivariate Normal restricted to a (k1)(k-1)-dimensional subspace, hence χk12\chi^2_{k-1} by Cochran's theorem.

Why It Matters

This is the first asymptotic test in classical statistics and remains the most commonly used test for categorical data. Applications: testing whether a die is fair, whether observed nucleotide frequencies match a genome model, whether an empirical distribution matches a hypothesized one (after binning). The degrees-of-freedom rule is "number of cells minus one" because the cell counts are constrained to sum to nn, removing one degree of freedom.

Failure Mode

The asymptotic Chi-squared requires each expected count Ei=npiE_i = np_i to be at least roughly 5; the small-sample rule of thumb is borrowed from Cochran. With cells of expected count below 5, the asymptotic approximation is poor and exact tests (Fisher's exact test, Monte-Carlo permutation) should replace it. Estimating cell probabilities from the data (rather than fixing them under the null) consumes additional degrees of freedom: the asymptotic distribution becomes χk1r2\chi^2_{k - 1 - r} where rr is the number of parameters estimated. This is the case in the next theorem.

Chi-Squared Test of Independence

Theorem

Pearson Chi-Squared Test of Independence

Statement

For an r×cr\times c contingency table with observed counts OijO_{ij}, row totals RiR_i, and column totals CjC_j, let Eij=RiCj/nE_{ij} = R_i C_j / n be the expected counts under independence. The Pearson statistic X2=i=1rj=1c(OijEij)2EijX^2 = \sum_{i=1}^r\sum_{j=1}^c\frac{(O_{ij} - E_{ij})^2}{E_{ij}} satisfies X2dχ(r1)(c1)2X^2\to_d\chi^2_{(r-1)(c-1)} under the null hypothesis of independence as nn\to\infty. The test rejects at level α\alpha when X2X^2 exceeds the 1α1-\alpha quantile of χ(r1)(c1)2\chi^2_{(r-1)(c-1)}.

Intuition

Under independence, pij=pipjp_{ij} = p_{i\cdot}p_{\cdot j}. The MLE of the row and column marginals consumes r1+c1r - 1 + c - 1 degrees of freedom (the marginals must sum to one, removing one constraint each). The total degrees of freedom are rc1rc - 1 (for the multinomial constraint) minus r1c+1r - 1 - c + 1 (for the estimated parameters), giving rc1(r1)(c1)=(r1)(c1)rc - 1 - (r - 1) - (c - 1) = (r-1)(c-1).

Proof Sketch

Under the null, the cell probabilities have the product form pij=pipjp_{ij} = p_{i\cdot}p_{\cdot j} with r+c2r + c - 2 free parameters. Plugging in the MLEs p^i=Ri/n\hat p_{i\cdot} = R_i/n and p^j=Cj/n\hat p_{\cdot j} = C_j/n gives the estimated expected counts Eij=RiCj/nE_{ij} = R_iC_j/n. The Pearson statistic with estimated expected counts is asymptotically Chi-squared with rc1(r+c2)=(r1)(c1)rc - 1 - (r + c - 2) = (r-1)(c-1) degrees of freedom, by the general "lost degrees of freedom" result for Pearson Chi-squared with estimated parameters.

Why It Matters

This is the canonical test for association in cross-classified data: whether smoking status is independent of lung-cancer status, whether browser is independent of geography, whether treatment is independent of response. The test is asymptotic; for sparse tables with expected cell counts below 5, prefer Fisher's exact test (for 2×22\times 2) or a permutation test (for larger). When the null is rejected, follow up with standardized residuals (OijEij)/Eij(O_{ij} - E_{ij})/\sqrt{E_{ij}} to locate the cells driving the rejection.

Failure Mode

The Chi-squared test treats rows and columns symmetrically. If the data are not i.i.d. (e.g., repeated measures on the same subject across columns), the asymptotic distribution is wrong and the test is invalid. For paired binary data on the same subjects, use McNemar's test. For repeated measures with more than two columns, use Cochran's Q or a mixed model.

When to Use Chi-Squared Tests

ScenarioTestDegrees of freedom
Hypothesized cell probabilities, no estimationPearson GOFk1k - 1
Hypothesized cell probabilities, rr parameters estimatedPearson GOFk1rk - 1 - r
Independence in r×cr\times c contingency tablePearson chi-squared independence(r1)(c1)(r-1)(c-1)
Homogeneity across gg groups in kk-cell tablePearson chi-squared homogeneity(g1)(k1)(g-1)(k-1)
Likelihood-ratio version of any of the aboveGG-testsame df as Pearson
Sparse table or any cell with Eij<5E_{ij} < 5Fisher's exact or permutation testnot Chi-squared

The GG-test uses the statistic G=2Olog(O/E)G = 2\sum O\log(O/E), which is asymptotically equivalent to the Pearson statistic but is the natural likelihood-ratio version. See likelihood-ratio, Wald, and score tests for the connection.

Common Confusions

Watch Out

Chi-squared distribution versus Chi-squared concentration

The Chi-squared distribution is the exact distribution of Zi2\sum Z_i^2 for standard Normals. Chi-squared concentration refers to Laurent-Massart sub-Gamma tail bounds for Chi-squared random variables, which are finite-sample inequalities, not statements about the law. See chi-squared concentration for the bound; this page is about the law.

Watch Out

Degrees of freedom counts free parameters, not cells

The Pearson Chi-squared GOF statistic has k1k - 1 degrees of freedom when cell probabilities are fully specified, but k1rk - 1 - r degrees when rr parameters were estimated from the data. Software defaults often assume the simpler case; check that the degrees of freedom match the number of parameters you actually fixed under the null.

Watch Out

Pearson statistic versus likelihood-ratio G-statistic

The Pearson statistic (OE)2/E\sum(O - E)^2/E and the GG-statistic 2Olog(O/E)2\sum O\log(O/E) are asymptotically equivalent and have the same Chi-squared reference distribution. They are not the same number in finite samples; the GG-statistic is the LRT, the Pearson statistic is the score test. For modest cell counts they typically agree to within a few percent; for very sparse data they can differ noticeably and neither matches the exact reference well.

Watch Out

Two-sided versus one-sided

The Pearson Chi-squared test is intrinsically two-sided in the cell-residual sense: both positive and negative deviations contribute to X2X^2. The reference distribution is one-sided in the tail of the Chi-squared (right tail only). A test with a left-tail rejection region for X2X^2 would be testing for "too good a fit", which is occasionally done to detect data fabrication, but it is not the standard direction.

Exercises

ExerciseCore

Problem

A six-sided die is rolled 60 times with results (O1,,O6)=(8,11,9,12,7,13)(O_1,\dots,O_6) = (8, 11, 9, 12, 7, 13). Test the null hypothesis that the die is fair at level 0.05.

ExerciseCore

Problem

A clinical trial cross-classifies 200 patients by treatment (drug, placebo) and outcome (improved, not improved). The 2x2 table has O11=60O_{11} = 60 (drug, improved), O12=40O_{12} = 40 (drug, not improved), O21=45O_{21} = 45 (placebo, improved), O22=55O_{22} = 55 (placebo, not improved). Test independence at level 0.05.

ExerciseAdvanced

Problem

Let X1,,X20X_1,\dots,X_{20} be i.i.d. N(μ,4)\mathcal{N}(\mu, 4). Construct a 95% confidence interval for σ2=4\sigma^2 = 4 based on the sample variance, and verify it covers 44 for the symbolic sample variance value S2=4.2S^2 = 4.2.

ExerciseAdvanced

Problem

A genetic linkage experiment cross-classifies 300 offspring of a cross into four phenotype classes with predicted Mendelian ratios 9:3:3:1. Observed counts are (O1,O2,O3,O4)=(160,62,55,23)(O_1, O_2, O_3, O_4) = (160, 62, 55, 23). Test the Mendelian prediction at level 0.05.

References

Canonical:

  • Casella and Berger, Statistical Inference (2002), Chapter 5 (Section 5.3 on sample variance distribution), Chapter 8 (Section 8.3 on Pearson Chi-squared), Chapter 10 (asymptotic Chi-squared distribution of the LRT).
  • Lehmann and Romano, Testing Statistical Hypotheses (2005), Chapter 14 (large-sample tests, Chi-squared asymptotics).
  • Agresti, Categorical Data Analysis (2013), Chapter 2 (Pearson Chi-squared and likelihood-ratio tests for contingency tables).

Foundational papers:

  • Pearson, "On the criterion that a given system of deviations..." (Philosophical Magazine, 1900), the original Chi-squared goodness-of-fit derivation.
  • Fisher, "On the interpretation of χ2\chi^2 from contingency tables..." (J. R. Statist. Soc., 1922), the corrected degrees of freedom argument.

Sample-variance derivation:

  • Cochran, "The distribution of quadratic forms in a normal system..." (Proc. Cambridge Philos. Soc., 1934), the orthogonal-decomposition argument.

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

4

Derived topics

3