Statistics
Analysis of Variance
One-way ANOVA decomposes the total sum of squares into a between-group component and a within-group component. Under iid normal data with equal group variances, the ratio of mean squares has an F distribution and gives an exact test of the equal-means null hypothesis. The Welch correction handles unequal variances. Two-way ANOVA partitions further into main effects and interaction. Post-hoc procedures (Tukey HSD, Bonferroni, Scheffe) correct for the multiple-comparison problem that naive pairwise t-tests ignore.
Prerequisites
Why This Matters
The one-way ANOVA -statistic is the first multi-group comparison every student of statistics learns, and it is the engine inside fixed-effects regression, mixed-effects models, designed experiments, randomized trials, and quality-control inspection. The construction is short: decompose the total variability into a between-group piece (signal) and a within-group piece (noise), form the ratio, and read off the test statistic.
ANOVA also shows where the variance-stabilizing transformations and the multivariate normal pay off in applications: the distribution under the null is exact for iid normal data with equal variances, and approximate otherwise. The same machinery generalizes to two-way layouts with interaction, to repeated-measures designs, and to all of the post-hoc multiple-comparison machinery.
One-Way ANOVA: Setup and Decomposition
Consider groups with observations in group , total . Write where is the group mean and are iid noise. Define the group means and the grand mean
One-Way ANOVA Sum-of-Squares Decomposition
Statement
The total sum of squares decomposes as The decomposition is purely algebraic and does not require any distributional assumption.
Proof Sketch
Add and subtract inside the squared deviation: . Expand the square. The cross-term sums to zero because inside each group, leaving the two stated sums of squares.
Why It Matters
The decomposition is the geometric heart of ANOVA: is the squared distance from the group-mean fit to the grand-mean fit, and is the squared distance from the data to the group-mean fit. Under iid normal data with equal variances, these two squared distances are independent and have known chi-squared distributions; their ratio gives the statistic.
Failure Mode
The decomposition is exact. The downstream -distribution claim is what fails under non-normal data or unequal variances; the decomposition itself holds for any partition of the data.
The F-Statistic and Its Distribution
The standard one-way ANOVA test is for the null hypothesis .
F Distribution of the One-Way ANOVA Statistic Under the Null
Statement
Under and the iid assumption, The two sums of squares satisfy and they are independent.
Intuition
Project the data onto the subspace of vectors that are constant within each group (this gives the group means) and onto its orthogonal complement (this gives the within-group residuals). Under joint normality with equal variances, the two projections are jointly normal with zero cross-covariance, hence independent (see multivariate normal). The squared norms are independent chi-squared with degrees of freedom equal to the dimensions of the two subspaces.
Proof Sketch
Under , independently. Consider the data vector . Let be the subspace of where each group has its own constant value; let be the subspace where all entries are the same constant. Then and . The subspace has dimension and has dimension .
Under , . Projecting an isotropic Gaussian onto orthogonal subspaces gives independent Gaussians of the lower dimensions, and squared norms become scaled chi-squareds: independently. The ratio of independent scaled chi-squareds divided by their degrees of freedom is the distribution by definition.
Why It Matters
This is one of the cleanest exact-distribution results in classical statistics. Under the stated assumptions, the test has level exactly for any finite sample size . The same construction underlies the tests in linear regression, ANCOVA, repeated-measures, and split-plot designs; they differ only in the choice of nested subspaces.
Failure Mode
Three assumptions can break. (1) Non-normality. The test tolerates mild non-normality when the sample sizes are equal, but its level and power degrade when sample sizes are unequal. (2) Unequal variances (heteroscedasticity). The statistic no longer has an distribution; use the Welch correction below. (3) Non-independence (e.g., repeated measurements, clustered data). The decomposition itself is fine, but the chi-squared degrees of freedom are wrong; use a mixed model.
Unequal Variances: The Welch Correction
When the assumption of equal variances across groups fails (the Behrens-Fisher problem), the ratio MS-between / MS-within no longer has an distribution. Welch (1951) proposed a correction that approximates the null distribution as for a data-driven degrees-of-freedom adjustment.
Welch ANOVA Statistic
Statement
Define weights where is the within-group sample variance, , and the weighted grand mean . The Welch statistic is Under the equal-means null and approximate normality, is approximately where
Intuition
The naive equal-variance pooled estimate of is replaced by a weighted average that gives more weight to groups with smaller variance. The degrees of freedom are then adjusted to match the first two moments of the resulting distribution.
Why It Matters
Welch's correction is the default ANOVA in most statistical software (e.g., oneway.test in R) when variances are not assumed equal. For two groups, the same idea gives Welch's -test, which is now the recommended replacement for the equal-variance two-sample -test in nearly all applied contexts.
Failure Mode
The Welch approximation is good when sample sizes are at least moderate (each is a reasonable rule of thumb). For very small group sizes the approximation can be poor and a permutation or bootstrap alternative is safer.
Two-Way ANOVA: Main Effects and Interaction
For a two-factor design with factor at levels and factor at levels, the model with replicates per cell is with sum-to-zero constraints , , and for identifiability.
Two-Way ANOVA Sum-of-Squares Decomposition
Statement
With observations, the total sum of squares decomposes orthogonally into four components: where Under for each of the three effects, the corresponding -statistic has an exact distribution with degrees of freedom , where , , .
Intuition
The four components are orthogonal projections onto four nested subspaces. The dimension counts give the degrees of freedom. Under jointly normal data, the chi-squared distributions are independent (the multivariate normal independence property again) and the ratio is .
Why It Matters
This is the design-of-experiments core. The interaction sum of squares is the part of the variability that is not explained by the main effects acting additively; a significant interaction means the effect of one factor depends on the level of the other. Without checking the interaction, additive main-effects conclusions can be misleading.
Failure Mode
Unbalanced designs (unequal per cell) break the orthogonality of the decomposition. Type I, II, and III sums of squares (sequential, hierarchical, partial) become different and the choice depends on the scientific question. Modern software defaults vary; specify the type explicitly in any reported analysis.
Post-Hoc Comparisons: Why Pairwise t-Tests Are Wrong
A significant ANOVA test rejects but does not identify which group means differ. Running pairwise -tests at level each inflates the family-wise error rate (FWER) to approximately , which for and already exceeds . Post-hoc procedures fix this.
Bonferroni. Run each of the pairwise tests at level . The FWER is at most by the union bound. Bonferroni is conservative (the bound is loose when tests are positively correlated), and most useful when the number of comparisons is small.
Tukey HSD. Use the studentized-range distribution of the maximum standardized difference among group means. The HSD critical value comes from the distribution of under the equal-means null with equal sample sizes. Confidence intervals have simultaneous coverage over all pairs. Tukey is the standard choice for all-pairs comparisons with balanced data; it is sharper than Bonferroni in this setting.
Scheffe. Adjust each contrast (any linear combination with ) using the critical value . Scheffe gives simultaneous coverage for the infinite family of all contrasts, not just pairwise ones; it is the right tool for data-driven contrast exploration but is overly conservative for restricted families.
The choice among these is determined by the family of comparisons of interest. Pre-specified small families: Bonferroni or Holm. All pairwise: Tukey. Open-ended contrast search: Scheffe.
Common Confusions
A significant F does not say which groups differ
The ANOVA test answers the omnibus question "are any of the group means different?" not "which pairs of means differ?" The post-hoc procedures answer the second question; running them only after rejects ("protected" testing) is one common approach. Reporting only the test result and informally describing which means look biggest is not statistics.
ANOVA does not require the groups to be ordered
Group labels in ANOVA are nominal: there is no ordering and no notion of distance between groups. If the levels are ordered (dose levels, age bins), a regression with the level as a numeric predictor or a trend test is usually more informative than the ANOVA .
Equal-variance is the assumption most often violated
ANOVA tolerates mild non-normality reasonably well, especially with balanced and large samples. It is far more sensitive to unequal variances combined with unequal sample sizes. When in doubt, use Welch.
ANOVA is a special case of linear regression
The one-way ANOVA statistic equals the statistic from regressing on dummy variables encoding group membership. Two-way ANOVA is the same as regressing on dummies for both factors plus their product terms. Modern statistical packages use the regression formulation as the unifying frame; ANOVA tables are a presentation choice, not a separate methodology.
Exercises
Problem
Three groups of observations are drawn iid from . The group means are , , , the within-group sample variances are . Compute the statistic and state its degrees of freedom under the equal-means null.
Problem
For groups, find the Bonferroni-corrected significance level for pairwise comparisons such that the family-wise error rate is at most . How much smaller is this than ?
Problem
Show that for groups with , the ANOVA statistic equals the square of the two-sample equal-variance statistic, with matching .
References
Canonical:
- Casella and Berger, Statistical Inference (2002), 2nd edition, Chapter 11
- Lehmann and Romano, Testing Statistical Hypotheses (2005), 3rd edition, Chapter 7
- Scheffe, The Analysis of Variance (1959). The original monograph; still the deepest reference on the geometry.
Foundational papers:
- Fisher, "Statistical Methods for Research Workers" (1925) introduced the technique; Chapter 7.
- Welch, "On the comparison of several mean values: an alternative approach" (Biometrika, 1951), volume 38, pages 330-336
- Tukey, "The problem of multiple comparisons" (unpublished manuscript, 1953; later in The Collected Works of John W. Tukey, Volume VIII)
- Bonferroni, "Teoria statistica delle classi e calcolo delle probabilita" (1936). The union-bound correction.
Applied references:
- Box, Hunter, and Hunter, Statistics for Experimenters (2005), 2nd edition. The design-of-experiments perspective.
- Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning (2009), 2nd edition, Section 3.2 (ANOVA-as-regression).
- Hsu, Multiple Comparisons: Theory and Methods (1996). The post-hoc-procedure reference.
Next Topics
- Linear regression: ANOVA as a special case of linear-model inference with categorical predictors.
- Hypothesis testing for ML: the broader testing framework, including the test's role inside it.
- Variance-stabilizing transformations: the preprocessing step that makes ANOVA assumptions reasonable for count or proportion data.
- Bootstrap methods: the resampling alternative when normality or independence fails.
Last reviewed: May 12, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
5- Expectation, Variance, Covariance, and Momentslayer 0A · tier 1
- Central Limit Theoremlayer 0B · tier 1
- The Multivariate Normal Distributionlayer 0B · tier 1
- Linear Regressionlayer 1 · tier 1
- Hypothesis Testing for MLlayer 2 · tier 2
Derived topics
0No published topic currently declares this as a prerequisite.