Chi-Squared Distribution and Tests

Sneiderman, Robby

Statistical Estimation

Chi-Squared Distribution and Tests

The Chi-squared distribution as sum of squared standard Normals and as the sampling distribution of the scaled sample variance, plus the two Pearson Chi-squared tests: goodness of fit for cell counts and independence in contingency tables.

ImportantCoreTier 1StableCore spine~60 min

For:StatsActuarialGeneral

Prerequisites

Distributions Atlas Normal Distribution Gamma Distribution Hypothesis Testing for ML

Prereq Map

Why This Matters

The Chi-squared distribution serves two distinct roles. As a sampling distribution, it is the law of the sum of $k$ independent squared standard Normals, equivalently the Gamma with shape $k/2$ and rate $1/2$ . As a test statistic distribution, it is the asymptotic null distribution of the Pearson Chi-squared statistic for cell counts and contingency tables.

Both uses rely on the same fact: a quadratic form in Normal random variables, normalized appropriately, is Chi-squared. For sample variance, the relevant quadratic form is $\sum(X_i - \bar X)^2$ . For Pearson Chi-squared, the relevant quadratic form is $\sum(O_i - E_i)^2/E_i$ , which is asymptotically a quadratic form in standardized Normal summands by the central limit theorem applied to multinomial cell counts. This page derives both: the sampling distribution for the Normal model and the asymptotic null distribution for the multinomial.

The Chi-Squared Distribution

Definition

Chi-Squared Distribution $X \sim χ_{k}^{2}$

A random variable $X$ has a Chi-squared distribution with $k$ degrees of freedom, $k\in\{1,2,\dots\}$ , if its density is

$f_X(x) = \frac{1}{2^{k/2}\Gamma(k/2)}x^{k/2-1}e^{-x/2},\qquad x > 0.$

Equivalently, $X = \sum_{i=1}^k Z_i^2$ where $Z_1,\dots,Z_k$ are i.i.d. standard Normal. The mean is $k$ and the variance is $2k$ .

In the Gamma family, $\chi^2_k = \operatorname{Gamma}(k/2, 1/2)$ . The Chi-squared is the half-integer-shape slice of the Gamma; every Chi-squared identity follows from a Gamma identity.

Theorem

Sum of Squared Normals

Statement

If $Z_1,\dots,Z_k$ are i.i.d. $\mathcal{N}(0,1)$ , then $\sum_{i=1}^k Z_i^2 \sim \chi^2_k.$ For $a\in\mathbb{R}$ , $aZ_1$ is Normal with variance $a^2$ , so $(aZ_1)^2/a^2 = Z_1^2\sim\chi^2_1$ ; summing $k$ such terms gives the result.

Intuition

Squaring a standard Normal kills its sign and gives a positive random variable with density proportional to $z^{-1}e^{-z/2}/\sqrt z$ , which is the $\chi^2_1$ density. Summing $k$ independent squared Normals adds the shape parameter (Gamma additivity).

Proof Sketch

Compute the density of $Y = Z_1^2$ for $Z_1\sim\mathcal{N}(0,1)$ by the change-of-variables formula: for $y > 0$ , $f_Y(y) = (\varphi(\sqrt y) + \varphi(-\sqrt y))/(2\sqrt y) = \varphi(\sqrt y)/\sqrt y = e^{-y/2}/\sqrt{2\pi y}$ . This matches the $\chi^2_1$ density. Sum over $k$ independent copies and apply Gamma additivity with common rate $1/2$ .

Why It Matters

Every classical sampling distribution involving sums of squared deviations from a Normal model is Chi-squared. The sample variance of an i.i.d. Normal sample has $(n-1)S^2/\sigma^2\sim\chi^2_{n-1}$ . The residual sum of squares in linear regression with Normal errors is a Chi-squared with degrees of freedom equal to $n$ minus the number of regression coefficients.

Failure Mode

The result requires independence and unit-variance Normality. For non-Normal samples, $\sum Z_i^2$ is not Chi-squared; finite-sample tests that assume Chi-squared on quadratic forms of non-Normal data are misspecified. The asymptotic version (Pearson Chi-squared below) makes a weaker assumption and recovers Chi-squared in the limit.

report a correction →

Sample Variance as Chi-Squared

Theorem

Sample Variance Distribution for Normal Data

Statement

Let $X_1,\dots,X_n$ be i.i.d. $\mathcal{N}(\mu,\sigma^2)$ and let $S^2 = (1/(n-1))\sum(X_i-\bar X_n)^2$ be the unbiased sample variance. Then $\frac{(n-1)S^2}{\sigma^2}\sim\chi^2_{n-1},$ and the sample mean $\bar X_n$ is independent of $S^2$ .

Intuition

The sample mean is the projection of the sample vector onto the all-ones direction. The deviations $X_i - \bar X_n$ live in the orthogonal hyperplane, a space of dimension $n - 1$ . Their squared norm, scaled by $\sigma^2$ , is the squared length of an $(n-1)$ -dimensional standard Normal vector, which is $\chi^2_{n-1}$ . The independence of mean and variance is the orthogonality.

Proof Sketch

This is the orthogonal-decomposition argument from normal distribution. Apply an orthogonal change of basis with first axis along the all-ones direction. The transformed sample is standard Normal in all $n$ coordinates; the first coordinate captures $\bar X_n$ and the remaining $n - 1$ coordinates capture the deviations. The squared norm of $n - 1$ independent standard Normals is $\chi^2_{n-1}$ , and the first coordinate is independent of the rest. Multiplying back by $\sigma^2$ gives the stated scaling.

Why It Matters

This is the building block of the Student-t and F sampling distributions. The t-statistic $(\bar X_n - \mu)/(S/\sqrt n)$ is a ratio of a Normal and a root-Chi-squared (over its degrees of freedom), so it is exactly $t_{n-1}$ . The F statistic in F distribution and ANOVA is a ratio of two scaled sample variances, so it is exactly $F_{d_1,d_2}$ .

Failure Mode

The exact Chi-squared distribution of $S^2$ and the independence of $\bar X_n$ from $S^2$ are special properties of the Normal distribution. For non-Normal i.i.d. samples, $S^2$ is asymptotically normal with variance involving the fourth central moment, and is only asymptotically uncorrelated with $\bar X_n$ , not independent in finite samples. The exact t-distribution argument therefore fails outside the Normal model.

report a correction →

Pearson Chi-Squared Goodness of Fit

Theorem

Pearson Chi-Squared Goodness of Fit

Statement

Let $O_1,\dots,O_k$ be the observed counts of $n$ i.i.d. observations falling into $k$ cells, and let $E_i = np_i$ be the expected counts under the null hypothesis $p_1,\dots,p_k$ . Define the Pearson statistic $X^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}.$ Under the null, $X^2\to_d\chi^2_{k-1}$ as $n\to\infty$ . The test rejects the null at level $\alpha$ when $X^2$ exceeds the $1-\alpha$ quantile of $\chi^2_{k-1}$ .

Intuition

The observed counts $(O_1,\dots,O_k)$ are multinomial under the null. The standardized cell residuals $(O_i - E_i)/\sqrt{E_i}$ are asymptotically Normal with covariance matrix $I - \sqrt p \sqrt p^\top$ (a projection onto the orthogonal complement of $\sqrt p = (\sqrt{p_1},\dots,\sqrt{p_k})$ ). The Pearson statistic is the squared Euclidean norm of this random vector, which is the squared length of a standard Normal projected onto a $(k-1)$ -dimensional subspace, hence $\chi^2_{k-1}$ .

Proof Sketch

The vector of cell counts $(O_1,\dots,O_k)$ is multinomial. By the multivariate central limit theorem, $\sqrt n\left(\frac{O_i}{n} - p_i\right)_{i=1}^k\to_d\mathcal{N}(\mathbf 0, \Sigma)$ where $\Sigma_{ij} = p_i(\delta_{ij} - p_j)$ . Standardize component-wise: $Z_i = (O_i - np_i)/\sqrt{np_i}$ . The covariance of $(Z_1,\dots,Z_k)$ converges to $I - \sqrt p\sqrt p^\top$ , a rank $k - 1$ projection matrix. The sum of squares $\sum Z_i^2 = X^2$ is the squared norm of a multivariate Normal restricted to a $(k-1)$ -dimensional subspace, hence $\chi^2_{k-1}$ by Cochran's theorem.

Why It Matters

This is the first asymptotic test in classical statistics and remains the most commonly used test for categorical data. Applications: testing whether a die is fair, whether observed nucleotide frequencies match a genome model, whether an empirical distribution matches a hypothesized one (after binning). The degrees-of-freedom rule is "number of cells minus one" because the cell counts are constrained to sum to $n$ , removing one degree of freedom.

Failure Mode

The asymptotic Chi-squared requires each expected count $E_i = np_i$ to be at least roughly 5; the small-sample rule of thumb is borrowed from Cochran. With cells of expected count below 5, the asymptotic approximation is poor and exact tests (Fisher's exact test, Monte-Carlo permutation) should replace it. Estimating cell probabilities from the data (rather than fixing them under the null) consumes additional degrees of freedom: the asymptotic distribution becomes $\chi^2_{k - 1 - r}$ where $r$ is the number of parameters estimated. This is the case in the next theorem.

report a correction →

Chi-Squared Test of Independence

Theorem

Pearson Chi-Squared Test of Independence

Statement

For an $r\times c$ contingency table with observed counts $O_{ij}$ , row totals $R_i$ , and column totals $C_j$ , let $E_{ij} = R_i C_j / n$ be the expected counts under independence. The Pearson statistic $X^2 = \sum_{i=1}^r\sum_{j=1}^c\frac{(O_{ij} - E_{ij})^2}{E_{ij}}$ satisfies $X^2\to_d\chi^2_{(r-1)(c-1)}$ under the null hypothesis of independence as $n\to\infty$ . The test rejects at level $\alpha$ when $X^2$ exceeds the $1-\alpha$ quantile of $\chi^2_{(r-1)(c-1)}$ .

Intuition

Under independence, $p_{ij} = p_{i\cdot}p_{\cdot j}$ . The MLE of the row and column marginals consumes $r - 1 + c - 1$ degrees of freedom (the marginals must sum to one, removing one constraint each). The total degrees of freedom are $rc - 1$ (for the multinomial constraint) minus $r - 1 - c + 1$ (for the estimated parameters), giving $rc - 1 - (r - 1) - (c - 1) = (r-1)(c-1)$ .

Proof Sketch

Under the null, the cell probabilities have the product form $p_{ij} = p_{i\cdot}p_{\cdot j}$ with $r + c - 2$ free parameters. Plugging in the MLEs $\hat p_{i\cdot} = R_i/n$ and $\hat p_{\cdot j} = C_j/n$ gives the estimated expected counts $E_{ij} = R_iC_j/n$ . The Pearson statistic with estimated expected counts is asymptotically Chi-squared with $rc - 1 - (r + c - 2) = (r-1)(c-1)$ degrees of freedom, by the general "lost degrees of freedom" result for Pearson Chi-squared with estimated parameters.

Why It Matters

This is the canonical test for association in cross-classified data: whether smoking status is independent of lung-cancer status, whether browser is independent of geography, whether treatment is independent of response. The test is asymptotic; for sparse tables with expected cell counts below 5, prefer Fisher's exact test (for $2\times 2$ ) or a permutation test (for larger). When the null is rejected, follow up with standardized residuals $(O_{ij} - E_{ij})/\sqrt{E_{ij}}$ to locate the cells driving the rejection.

Failure Mode

The Chi-squared test treats rows and columns symmetrically. If the data are not i.i.d. (e.g., repeated measures on the same subject across columns), the asymptotic distribution is wrong and the test is invalid. For paired binary data on the same subjects, use McNemar's test. For repeated measures with more than two columns, use Cochran's Q or a mixed model.

report a correction →

When to Use Chi-Squared Tests

Scenario	Test	Degrees of freedom
Hypothesized cell probabilities, no estimation	Pearson GOF	$k - 1$
Hypothesized cell probabilities, $r$ parameters estimated	Pearson GOF	$k - 1 - r$
Independence in $r\times c$ contingency table	Pearson chi-squared independence	$(r-1)(c-1)$
Homogeneity across $g$ groups in $k$ -cell table	Pearson chi-squared homogeneity	$(g-1)(k-1)$
Likelihood-ratio version of any of the above	$G$ -test	same df as Pearson
Sparse table or any cell with $E_{ij} < 5$	Fisher's exact or permutation test	not Chi-squared

The $G$ -test uses the statistic $G = 2\sum O\log(O/E)$ , which is asymptotically equivalent to the Pearson statistic but is the natural likelihood-ratio version. See likelihood-ratio, Wald, and score tests for the connection.

Common Confusions

Watch Out

Chi-squared distribution versus Chi-squared concentration

The Chi-squared distribution is the exact distribution of $\sum Z_i^2$ for standard Normals. Chi-squared concentration refers to Laurent-Massart sub-Gamma tail bounds for Chi-squared random variables, which are finite-sample inequalities, not statements about the law. See chi-squared concentration for the bound; this page is about the law.

Watch Out

Degrees of freedom counts free parameters, not cells

The Pearson Chi-squared GOF statistic has $k - 1$ degrees of freedom when cell probabilities are fully specified, but $k - 1 - r$ degrees when $r$ parameters were estimated from the data. Software defaults often assume the simpler case; check that the degrees of freedom match the number of parameters you actually fixed under the null.

Watch Out

Pearson statistic versus likelihood-ratio G-statistic

The Pearson statistic $\sum(O - E)^2/E$ and the $G$ -statistic $2\sum O\log(O/E)$ are asymptotically equivalent and have the same Chi-squared reference distribution. They are not the same number in finite samples; the $G$ -statistic is the LRT, the Pearson statistic is the score test. For modest cell counts they typically agree to within a few percent; for very sparse data they can differ noticeably and neither matches the exact reference well.

Watch Out

Two-sided versus one-sided

The Pearson Chi-squared test is intrinsically two-sided in the cell-residual sense: both positive and negative deviations contribute to $X^2$ . The reference distribution is one-sided in the tail of the Chi-squared (right tail only). A test with a left-tail rejection region for $X^2$ would be testing for "too good a fit", which is occasionally done to detect data fabrication, but it is not the standard direction.

Exercises

ExerciseCore

Problem

A six-sided die is rolled 60 times with results $(O_1,\dots,O_6) = (8, 11, 9, 12, 7, 13)$ . Test the null hypothesis that the die is fair at level 0.05.

ExerciseCore

Problem

A clinical trial cross-classifies 200 patients by treatment (drug, placebo) and outcome (improved, not improved). The 2x2 table has $O_{11} = 60$ (drug, improved), $O_{12} = 40$ (drug, not improved), $O_{21} = 45$ (placebo, improved), $O_{22} = 55$ (placebo, not improved). Test independence at level 0.05.

ExerciseAdvanced

Problem

Let $X_1,\dots,X_{20}$ be i.i.d. $\mathcal{N}(\mu, 4)$ . Construct a 95% confidence interval for $\sigma^2 = 4$ based on the sample variance, and verify it covers $4$ for the symbolic sample variance value $S^2 = 4.2$ .

ExerciseAdvanced

Problem

A genetic linkage experiment cross-classifies 300 offspring of a cross into four phenotype classes with predicted Mendelian ratios 9:3:3:1. Observed counts are $(O_1, O_2, O_3, O_4) = (160, 62, 55, 23)$ . Test the Mendelian prediction at level 0.05.

References

Canonical:

Casella and Berger, Statistical Inference (2002), Chapter 5 (Section 5.3 on sample variance distribution), Chapter 8 (Section 8.3 on Pearson Chi-squared), Chapter 10 (asymptotic Chi-squared distribution of the LRT).
Lehmann and Romano, Testing Statistical Hypotheses (2005), Chapter 14 (large-sample tests, Chi-squared asymptotics).
Agresti, Categorical Data Analysis (2013), Chapter 2 (Pearson Chi-squared and likelihood-ratio tests for contingency tables).

Foundational papers:

Pearson, "On the criterion that a given system of deviations..." (Philosophical Magazine, 1900), the original Chi-squared goodness-of-fit derivation.
Fisher, "On the interpretation of $\chi^2$ from contingency tables..." (J. R. Statist. Soc., 1922), the corrected degrees of freedom argument.

Sample-variance derivation:

Cochran, "The distribution of quadratic forms in a normal system..." (Proc. Cambridge Philos. Soc., 1934), the orthogonal-decomposition argument.

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Distributions Atlaslayer 0A · tier 1
Gamma Distributionlayer 0A · tier 1
Normal Distributionlayer 0A · tier 1
Hypothesis Testing for MLlayer 2 · tier 2

Derived topics

3

F-Distribution and ANOVAlayer 1 · tier 1
Student-t Distribution and t-Testlayer 1 · tier 1
Likelihood-Ratio, Wald, and Score Testslayer 2 · tier 1

Graph-backed continuations

Student-t Distribution and t-Test F-Distribution and ANOVA Likelihood-Ratio, Wald, and Score Tests