Statistical Estimation
Chi-Squared Distribution and Tests
The Chi-squared distribution as sum of squared standard Normals and as the sampling distribution of the scaled sample variance, plus the two Pearson Chi-squared tests: goodness of fit for cell counts and independence in contingency tables.
Why This Matters
The Chi-squared distribution serves two distinct roles. As a sampling distribution, it is the law of the sum of independent squared standard Normals, equivalently the Gamma with shape and rate . As a test statistic distribution, it is the asymptotic null distribution of the Pearson Chi-squared statistic for cell counts and contingency tables.
Both uses rely on the same fact: a quadratic form in Normal random variables, normalized appropriately, is Chi-squared. For sample variance, the relevant quadratic form is . For Pearson Chi-squared, the relevant quadratic form is , which is asymptotically a quadratic form in standardized Normal summands by the central limit theorem applied to multinomial cell counts. This page derives both: the sampling distribution for the Normal model and the asymptotic null distribution for the multinomial.
The Chi-Squared Distribution
Chi-Squared Distribution
A random variable has a Chi-squared distribution with degrees of freedom, , if its density is
Equivalently, where are i.i.d. standard Normal. The mean is and the variance is .
In the Gamma family, . The Chi-squared is the half-integer-shape slice of the Gamma; every Chi-squared identity follows from a Gamma identity.
Sum of Squared Normals
Statement
If are i.i.d. , then For , is Normal with variance , so ; summing such terms gives the result.
Intuition
Squaring a standard Normal kills its sign and gives a positive random variable with density proportional to , which is the density. Summing independent squared Normals adds the shape parameter (Gamma additivity).
Proof Sketch
Compute the density of for by the change-of-variables formula: for , . This matches the density. Sum over independent copies and apply Gamma additivity with common rate .
Why It Matters
Every classical sampling distribution involving sums of squared deviations from a Normal model is Chi-squared. The sample variance of an i.i.d. Normal sample has . The residual sum of squares in linear regression with Normal errors is a Chi-squared with degrees of freedom equal to minus the number of regression coefficients.
Failure Mode
The result requires independence and unit-variance Normality. For non-Normal samples, is not Chi-squared; finite-sample tests that assume Chi-squared on quadratic forms of non-Normal data are misspecified. The asymptotic version (Pearson Chi-squared below) makes a weaker assumption and recovers Chi-squared in the limit.
Sample Variance as Chi-Squared
Sample Variance Distribution for Normal Data
Statement
Let be i.i.d. and let be the unbiased sample variance. Then and the sample mean is independent of .
Intuition
The sample mean is the projection of the sample vector onto the all-ones direction. The deviations live in the orthogonal hyperplane, a space of dimension . Their squared norm, scaled by , is the squared length of an -dimensional standard Normal vector, which is . The independence of mean and variance is the orthogonality.
Proof Sketch
This is the orthogonal-decomposition argument from normal distribution. Apply an orthogonal change of basis with first axis along the all-ones direction. The transformed sample is standard Normal in all coordinates; the first coordinate captures and the remaining coordinates capture the deviations. The squared norm of independent standard Normals is , and the first coordinate is independent of the rest. Multiplying back by gives the stated scaling.
Why It Matters
This is the building block of the Student-t and F sampling distributions. The t-statistic is a ratio of a Normal and a root-Chi-squared (over its degrees of freedom), so it is exactly . The F statistic in F distribution and ANOVA is a ratio of two scaled sample variances, so it is exactly .
Failure Mode
The exact Chi-squared distribution of and the independence of from are special properties of the Normal distribution. For non-Normal i.i.d. samples, is asymptotically normal with variance involving the fourth central moment, and is only asymptotically uncorrelated with , not independent in finite samples. The exact t-distribution argument therefore fails outside the Normal model.
Pearson Chi-Squared Goodness of Fit
Pearson Chi-Squared Goodness of Fit
Statement
Let be the observed counts of i.i.d. observations falling into cells, and let be the expected counts under the null hypothesis . Define the Pearson statistic Under the null, as . The test rejects the null at level when exceeds the quantile of .
Intuition
The observed counts are multinomial under the null. The standardized cell residuals are asymptotically Normal with covariance matrix (a projection onto the orthogonal complement of ). The Pearson statistic is the squared Euclidean norm of this random vector, which is the squared length of a standard Normal projected onto a -dimensional subspace, hence .
Proof Sketch
The vector of cell counts is multinomial. By the multivariate central limit theorem, where . Standardize component-wise: . The covariance of converges to , a rank projection matrix. The sum of squares is the squared norm of a multivariate Normal restricted to a -dimensional subspace, hence by Cochran's theorem.
Why It Matters
This is the first asymptotic test in classical statistics and remains the most commonly used test for categorical data. Applications: testing whether a die is fair, whether observed nucleotide frequencies match a genome model, whether an empirical distribution matches a hypothesized one (after binning). The degrees-of-freedom rule is "number of cells minus one" because the cell counts are constrained to sum to , removing one degree of freedom.
Failure Mode
The asymptotic Chi-squared requires each expected count to be at least roughly 5; the small-sample rule of thumb is borrowed from Cochran. With cells of expected count below 5, the asymptotic approximation is poor and exact tests (Fisher's exact test, Monte-Carlo permutation) should replace it. Estimating cell probabilities from the data (rather than fixing them under the null) consumes additional degrees of freedom: the asymptotic distribution becomes where is the number of parameters estimated. This is the case in the next theorem.
Chi-Squared Test of Independence
Pearson Chi-Squared Test of Independence
Statement
For an contingency table with observed counts , row totals , and column totals , let be the expected counts under independence. The Pearson statistic satisfies under the null hypothesis of independence as . The test rejects at level when exceeds the quantile of .
Intuition
Under independence, . The MLE of the row and column marginals consumes degrees of freedom (the marginals must sum to one, removing one constraint each). The total degrees of freedom are (for the multinomial constraint) minus (for the estimated parameters), giving .
Proof Sketch
Under the null, the cell probabilities have the product form with free parameters. Plugging in the MLEs and gives the estimated expected counts . The Pearson statistic with estimated expected counts is asymptotically Chi-squared with degrees of freedom, by the general "lost degrees of freedom" result for Pearson Chi-squared with estimated parameters.
Why It Matters
This is the canonical test for association in cross-classified data: whether smoking status is independent of lung-cancer status, whether browser is independent of geography, whether treatment is independent of response. The test is asymptotic; for sparse tables with expected cell counts below 5, prefer Fisher's exact test (for ) or a permutation test (for larger). When the null is rejected, follow up with standardized residuals to locate the cells driving the rejection.
Failure Mode
The Chi-squared test treats rows and columns symmetrically. If the data are not i.i.d. (e.g., repeated measures on the same subject across columns), the asymptotic distribution is wrong and the test is invalid. For paired binary data on the same subjects, use McNemar's test. For repeated measures with more than two columns, use Cochran's Q or a mixed model.
When to Use Chi-Squared Tests
| Scenario | Test | Degrees of freedom |
|---|---|---|
| Hypothesized cell probabilities, no estimation | Pearson GOF | |
| Hypothesized cell probabilities, parameters estimated | Pearson GOF | |
| Independence in contingency table | Pearson chi-squared independence | |
| Homogeneity across groups in -cell table | Pearson chi-squared homogeneity | |
| Likelihood-ratio version of any of the above | -test | same df as Pearson |
| Sparse table or any cell with | Fisher's exact or permutation test | not Chi-squared |
The -test uses the statistic , which is asymptotically equivalent to the Pearson statistic but is the natural likelihood-ratio version. See likelihood-ratio, Wald, and score tests for the connection.
Common Confusions
Chi-squared distribution versus Chi-squared concentration
The Chi-squared distribution is the exact distribution of for standard Normals. Chi-squared concentration refers to Laurent-Massart sub-Gamma tail bounds for Chi-squared random variables, which are finite-sample inequalities, not statements about the law. See chi-squared concentration for the bound; this page is about the law.
Degrees of freedom counts free parameters, not cells
The Pearson Chi-squared GOF statistic has degrees of freedom when cell probabilities are fully specified, but degrees when parameters were estimated from the data. Software defaults often assume the simpler case; check that the degrees of freedom match the number of parameters you actually fixed under the null.
Pearson statistic versus likelihood-ratio G-statistic
The Pearson statistic and the -statistic are asymptotically equivalent and have the same Chi-squared reference distribution. They are not the same number in finite samples; the -statistic is the LRT, the Pearson statistic is the score test. For modest cell counts they typically agree to within a few percent; for very sparse data they can differ noticeably and neither matches the exact reference well.
Two-sided versus one-sided
The Pearson Chi-squared test is intrinsically two-sided in the cell-residual sense: both positive and negative deviations contribute to . The reference distribution is one-sided in the tail of the Chi-squared (right tail only). A test with a left-tail rejection region for would be testing for "too good a fit", which is occasionally done to detect data fabrication, but it is not the standard direction.
Exercises
Problem
A six-sided die is rolled 60 times with results . Test the null hypothesis that the die is fair at level 0.05.
Problem
A clinical trial cross-classifies 200 patients by treatment (drug, placebo) and outcome (improved, not improved). The 2x2 table has (drug, improved), (drug, not improved), (placebo, improved), (placebo, not improved). Test independence at level 0.05.
Problem
Let be i.i.d. . Construct a 95% confidence interval for based on the sample variance, and verify it covers for the symbolic sample variance value .
Problem
A genetic linkage experiment cross-classifies 300 offspring of a cross into four phenotype classes with predicted Mendelian ratios 9:3:3:1. Observed counts are . Test the Mendelian prediction at level 0.05.
References
Canonical:
- Casella and Berger, Statistical Inference (2002), Chapter 5 (Section 5.3 on sample variance distribution), Chapter 8 (Section 8.3 on Pearson Chi-squared), Chapter 10 (asymptotic Chi-squared distribution of the LRT).
- Lehmann and Romano, Testing Statistical Hypotheses (2005), Chapter 14 (large-sample tests, Chi-squared asymptotics).
- Agresti, Categorical Data Analysis (2013), Chapter 2 (Pearson Chi-squared and likelihood-ratio tests for contingency tables).
Foundational papers:
- Pearson, "On the criterion that a given system of deviations..." (Philosophical Magazine, 1900), the original Chi-squared goodness-of-fit derivation.
- Fisher, "On the interpretation of from contingency tables..." (J. R. Statist. Soc., 1922), the corrected degrees of freedom argument.
Sample-variance derivation:
- Cochran, "The distribution of quadratic forms in a normal system..." (Proc. Cambridge Philos. Soc., 1934), the orthogonal-decomposition argument.
Last reviewed: May 11, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
4- Distributions Atlaslayer 0A · tier 1
- Gamma Distributionlayer 0A · tier 1
- Normal Distributionlayer 0A · tier 1
- Hypothesis Testing for MLlayer 2 · tier 2
Derived topics
3- F-Distribution and ANOVAlayer 1 · tier 1
- Student-t Distribution and t-Testlayer 1 · tier 1
- Likelihood-Ratio, Wald, and Score Testslayer 2 · tier 1
Graph-backed continuations