Skip to main content

Statistics

Analysis of Variance

One-way ANOVA decomposes the total sum of squares into a between-group component and a within-group component. Under iid normal data with equal group variances, the ratio of mean squares has an F distribution and gives an exact test of the equal-means null hypothesis. The Welch correction handles unequal variances. Two-way ANOVA partitions further into main effects and interaction. Post-hoc procedures (Tukey HSD, Bonferroni, Scheffe) correct for the multiple-comparison problem that naive pairwise t-tests ignore.

EssentialCoreTier 1StableCore spine~60 min
For:StatsGeneral

Why This Matters

The one-way ANOVA FF-statistic is the first multi-group comparison every student of statistics learns, and it is the engine inside fixed-effects regression, mixed-effects models, designed experiments, randomized trials, and quality-control inspection. The construction is short: decompose the total variability into a between-group piece (signal) and a within-group piece (noise), form the ratio, and read off the test statistic.

ANOVA also shows where the variance-stabilizing transformations and the multivariate normal pay off in applications: the FF distribution under the null is exact for iid normal data with equal variances, and approximate otherwise. The same machinery generalizes to two-way layouts with interaction, to repeated-measures designs, and to all of the post-hoc multiple-comparison machinery.

One-Way ANOVA: Setup and Decomposition

Consider K2K \geq 2 groups with nkn_k observations in group kk, total N=knkN = \sum_k n_k. Write Ykj=μk+εkj,k=1,,K,  j=1,,nk,Y_{kj} = \mu_k + \varepsilon_{kj}, \quad k = 1, \ldots, K,\; j = 1, \ldots, n_k, where μk\mu_k is the group mean and εkj\varepsilon_{kj} are iid noise. Define the group means and the grand mean Yˉk=1nkj=1nkYkj,Yˉ=1Nk=1Kj=1nkYkj.\bar Y_k = \frac{1}{n_k}\sum_{j=1}^{n_k} Y_{kj}, \quad \bar Y = \frac{1}{N}\sum_{k=1}^K\sum_{j=1}^{n_k} Y_{kj}.

Theorem

One-Way ANOVA Sum-of-Squares Decomposition

Statement

The total sum of squares decomposes as k=1Kj=1nk(YkjYˉ)2SStot  =  k=1Knk(YˉkYˉ)2SSbetween  +  k=1Kj=1nk(YkjYˉk)2SSwithin.\underbrace{\sum_{k=1}^K \sum_{j=1}^{n_k} (Y_{kj} - \bar Y)^2}_{\text{SS}_{\text{tot}}} \;=\; \underbrace{\sum_{k=1}^K n_k (\bar Y_k - \bar Y)^2}_{\text{SS}_{\text{between}}} \;+\; \underbrace{\sum_{k=1}^K \sum_{j=1}^{n_k} (Y_{kj} - \bar Y_k)^2}_{\text{SS}_{\text{within}}}. The decomposition is purely algebraic and does not require any distributional assumption.

Proof Sketch

Add and subtract Yˉk\bar Y_k inside the squared deviation: YkjYˉ=(YkjYˉk)+(YˉkYˉ)Y_{kj} - \bar Y = (Y_{kj} - \bar Y_k) + (\bar Y_k - \bar Y). Expand the square. The cross-term sums to zero because j(YkjYˉk)=0\sum_j (Y_{kj} - \bar Y_k) = 0 inside each group, leaving the two stated sums of squares.

Why It Matters

The decomposition is the geometric heart of ANOVA: SSbetween\text{SS}_{\text{between}} is the squared distance from the group-mean fit to the grand-mean fit, and SSwithin\text{SS}_{\text{within}} is the squared distance from the data to the group-mean fit. Under iid normal data with equal variances, these two squared distances are independent and have known chi-squared distributions; their ratio gives the FF statistic.

Failure Mode

The decomposition is exact. The downstream FF-distribution claim is what fails under non-normal data or unequal variances; the decomposition itself holds for any partition of the data.

The F-Statistic and Its Distribution

The standard one-way ANOVA test is for the null hypothesis H0:μ1=μ2==μKH_0 : \mu_1 = \mu_2 = \cdots = \mu_K.

Theorem

F Distribution of the One-Way ANOVA Statistic Under the Null

Statement

Under H0:μ1==μK=μH_0 : \mu_1 = \cdots = \mu_K = \mu and the iid N(μ,σ2)N(\mu, \sigma^2) assumption, F=SSbetween/(K1)SSwithin/(NK)=MSbetweenMSwithin    FK1,NK.F = \frac{\text{SS}_{\text{between}} / (K - 1)}{\text{SS}_{\text{within}} / (N - K)} = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} \;\sim\; F_{K - 1,\, N - K}. The two sums of squares satisfy SSbetweenσ2χK12,SSwithinσ2χNK2,\frac{\text{SS}_{\text{between}}}{\sigma^2} \sim \chi^2_{K - 1}, \quad \frac{\text{SS}_{\text{within}}}{\sigma^2} \sim \chi^2_{N - K}, and they are independent.

Intuition

Project the data onto the subspace of vectors that are constant within each group (this gives the group means) and onto its orthogonal complement (this gives the within-group residuals). Under joint normality with equal variances, the two projections are jointly normal with zero cross-covariance, hence independent (see multivariate normal). The squared norms are independent chi-squared with degrees of freedom equal to the dimensions of the two subspaces.

Proof Sketch

Under H0H_0, YkjN(μ,σ2)Y_{kj} \sim N(\mu, \sigma^2) independently. Consider the data vector YRN\mathbf{Y} \in \mathbb{R}^N. Let V1V_1 be the subspace of RN\mathbb{R}^N where each group has its own constant value; let V0V1V_0 \subset V_1 be the subspace where all entries are the same constant. Then SSwithin=YPV1Y2\text{SS}_{\text{within}} = \|\mathbf{Y} - P_{V_1}\mathbf{Y}\|^2 and SSbetween=PV1YPV0Y2\text{SS}_{\text{between}} = \|P_{V_1}\mathbf{Y} - P_{V_0}\mathbf{Y}\|^2. The subspace V1V_1 has dimension KK and V0V_0 has dimension 11.

Under H0H_0, Yμ1NN(0,σ2IN)\mathbf{Y} - \mu\mathbf{1} \sim N_N(0, \sigma^2 I_N). Projecting an isotropic Gaussian onto orthogonal subspaces gives independent Gaussians of the lower dimensions, and squared norms become scaled chi-squareds: SSwithinσ2χNK2,SSbetweenσ2χK12,\frac{\text{SS}_{\text{within}}}{\sigma^2} \sim \chi^2_{N - K}, \quad \frac{\text{SS}_{\text{between}}}{\sigma^2} \sim \chi^2_{K - 1}, independently. The ratio of independent scaled chi-squareds divided by their degrees of freedom is the FF distribution by definition.

Why It Matters

This is one of the cleanest exact-distribution results in classical statistics. Under the stated assumptions, the test has level exactly α\alpha for any finite sample size NN. The same construction underlies the FF tests in linear regression, ANCOVA, repeated-measures, and split-plot designs; they differ only in the choice of nested subspaces.

Failure Mode

Three assumptions can break. (1) Non-normality. The FF test tolerates mild non-normality when the sample sizes are equal, but its level and power degrade when sample sizes are unequal. (2) Unequal variances (heteroscedasticity). The FF statistic no longer has an FF distribution; use the Welch correction below. (3) Non-independence (e.g., repeated measurements, clustered data). The decomposition itself is fine, but the chi-squared degrees of freedom are wrong; use a mixed model.

Unequal Variances: The Welch Correction

When the assumption of equal variances across groups fails (the Behrens-Fisher problem), the ratio MS-between / MS-within no longer has an FF distribution. Welch (1951) proposed a correction that approximates the null distribution as FK1,νF_{K - 1, \nu^*} for a data-driven degrees-of-freedom adjustment.

Theorem

Welch ANOVA Statistic

Statement

Define weights wk=nk/Sk2w_k = n_k / S_k^2 where Sk2S_k^2 is the within-group sample variance, w=kwkw_\cdot = \sum_k w_k, and the weighted grand mean Y~=kwkYˉk/w\tilde Y = \sum_k w_k \bar Y_k / w_\cdot. The Welch statistic is F=kwk(YˉkY~)2/(K1)1+2(K2)K21k(1wk/w)2nk1.F^* = \frac{\sum_k w_k (\bar Y_k - \tilde Y)^2 / (K - 1)}{1 + \frac{2(K - 2)}{K^2 - 1} \sum_k \frac{(1 - w_k / w_\cdot)^2}{n_k - 1}}. Under the equal-means null and approximate normality, FF^* is approximately FK1,νF_{K - 1, \nu^*} where ν=K213k(1wk/w)2nk1.\nu^* = \frac{K^2 - 1}{3 \sum_k \frac{(1 - w_k / w_\cdot)^2}{n_k - 1}}.

Intuition

The naive equal-variance pooled estimate of σ2\sigma^2 is replaced by a weighted average that gives more weight to groups with smaller variance. The degrees of freedom are then adjusted to match the first two moments of the resulting distribution.

Why It Matters

Welch's correction is the default ANOVA in most statistical software (e.g., oneway.test in R) when variances are not assumed equal. For two groups, the same idea gives Welch's tt-test, which is now the recommended replacement for the equal-variance two-sample tt-test in nearly all applied contexts.

Failure Mode

The Welch approximation is good when sample sizes are at least moderate (each nk5n_k \geq 5 is a reasonable rule of thumb). For very small group sizes the approximation can be poor and a permutation or bootstrap alternative is safer.

Two-Way ANOVA: Main Effects and Interaction

For a two-factor design with factor AA at II levels and factor BB at JJ levels, the model with nn replicates per cell is Yijk=μ+αi+βj+(αβ)ij+εijk,i=1,,I,  j=1,,J,  k=1,,n,Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ijk}, \quad i = 1, \ldots, I,\; j = 1, \ldots, J,\; k = 1, \ldots, n, with sum-to-zero constraints iαi=0\sum_i \alpha_i = 0, jβj=0\sum_j \beta_j = 0, and i(αβ)ij=j(αβ)ij=0\sum_i (\alpha\beta)_{ij} = \sum_j (\alpha\beta)_{ij} = 0 for identifiability.

Theorem

Two-Way ANOVA Sum-of-Squares Decomposition

Statement

With N=IJnN = I J n observations, the total sum of squares decomposes orthogonally into four components: SStot  =  SSA+SSB+SSAB+SSE,\text{SS}_{\text{tot}} \;=\; \text{SS}_A + \text{SS}_B + \text{SS}_{AB} + \text{SS}_E, where SSA=Jni(YˉiYˉ)2,SSB=Inj(YˉjYˉ)2,\text{SS}_A = J n \sum_i (\bar Y_{i \cdot \cdot} - \bar Y)^2,\quad \text{SS}_B = I n \sum_j (\bar Y_{\cdot j \cdot} - \bar Y)^2, SSAB=ni,j(YˉijYˉiYˉj+Yˉ)2,SSE=i,j,k(YijkYˉij)2.\text{SS}_{AB} = n \sum_{i, j}(\bar Y_{ij \cdot} - \bar Y_{i \cdot \cdot} - \bar Y_{\cdot j \cdot} + \bar Y)^2,\quad \text{SS}_E = \sum_{i, j, k}(Y_{ijk} - \bar Y_{ij \cdot})^2. Under H0H_0 for each of the three effects, the corresponding FF-statistic Feffect=SSeffect/dfeffectSSE/(IJ(n1))F_{\text{effect}} = \frac{\text{SS}_{\text{effect}} / \text{df}_{\text{effect}}}{\text{SS}_E / (I J (n - 1))} has an exact FF distribution with degrees of freedom (dfeffect,IJ(n1))(\text{df}_{\text{effect}}, I J (n - 1)), where dfA=I1\text{df}_A = I - 1, dfB=J1\text{df}_B = J - 1, dfAB=(I1)(J1)\text{df}_{AB} = (I - 1)(J - 1).

Intuition

The four components are orthogonal projections onto four nested subspaces. The dimension counts give the degrees of freedom. Under jointly normal data, the chi-squared distributions are independent (the multivariate normal independence property again) and the ratio is FF.

Why It Matters

This is the design-of-experiments core. The interaction sum of squares SSAB\text{SS}_{AB} is the part of the variability that is not explained by the main effects acting additively; a significant interaction means the effect of one factor depends on the level of the other. Without checking the interaction, additive main-effects conclusions can be misleading.

Failure Mode

Unbalanced designs (unequal nn per cell) break the orthogonality of the decomposition. Type I, II, and III sums of squares (sequential, hierarchical, partial) become different and the choice depends on the scientific question. Modern software defaults vary; specify the type explicitly in any reported analysis.

Post-Hoc Comparisons: Why Pairwise t-Tests Are Wrong

A significant ANOVA FF test rejects H0:μ1==μKH_0 : \mu_1 = \cdots = \mu_K but does not identify which group means differ. Running (K2)\binom{K}{2} pairwise tt-tests at level α\alpha each inflates the family-wise error rate (FWER) to approximately 1(1α)(K2)1 - (1 - \alpha)^{\binom{K}{2}}, which for K=5K = 5 and α=0.05\alpha = 0.05 already exceeds 0.40.4. Post-hoc procedures fix this.

Bonferroni. Run each of the m=(K2)m = \binom{K}{2} pairwise tests at level α/m\alpha / m. The FWER is at most α\alpha by the union bound. Bonferroni is conservative (the bound is loose when tests are positively correlated), and most useful when the number of comparisons is small.

Tukey HSD. Use the studentized-range distribution of the maximum standardized difference among KK group means. The HSD critical value qα;K,NKq_{\alpha; K, N - K} comes from the distribution of maxi,j(YˉiYˉj)/MSE/n\max_{i, j} (\bar Y_i - \bar Y_j) / \sqrt{\text{MS}_E / n} under the equal-means null with equal sample sizes. Confidence intervals YˉiYˉj±qα;K,NKMSE/n\bar Y_i - \bar Y_j \pm q_{\alpha; K, N - K} \sqrt{\text{MS}_E / n} have simultaneous coverage 1α1 - \alpha over all (K2)\binom{K}{2} pairs. Tukey is the standard choice for all-pairs comparisons with balanced data; it is sharper than Bonferroni in this setting.

Scheffe. Adjust each contrast (any linear combination kckYˉk\sum_k c_k \bar Y_k with kck=0\sum_k c_k = 0) using the critical value (K1)Fα;K1,NK\sqrt{(K - 1) F_{\alpha; K - 1, N - K}}. Scheffe gives simultaneous coverage for the infinite family of all contrasts, not just pairwise ones; it is the right tool for data-driven contrast exploration but is overly conservative for restricted families.

The choice among these is determined by the family of comparisons of interest. Pre-specified small families: Bonferroni or Holm. All pairwise: Tukey. Open-ended contrast search: Scheffe.

Common Confusions

Watch Out

A significant F does not say which groups differ

The ANOVA FF test answers the omnibus question "are any of the group means different?" not "which pairs of means differ?" The post-hoc procedures answer the second question; running them only after FF rejects ("protected" testing) is one common approach. Reporting only the FF test result and informally describing which means look biggest is not statistics.

Watch Out

ANOVA does not require the groups to be ordered

Group labels in ANOVA are nominal: there is no ordering and no notion of distance between groups. If the levels are ordered (dose levels, age bins), a regression with the level as a numeric predictor or a trend test is usually more informative than the ANOVA FF.

Watch Out

Equal-variance is the assumption most often violated

ANOVA tolerates mild non-normality reasonably well, especially with balanced and large samples. It is far more sensitive to unequal variances combined with unequal sample sizes. When in doubt, use Welch.

Watch Out

ANOVA is a special case of linear regression

The one-way ANOVA FF statistic equals the FF statistic from regressing YY on K1K - 1 dummy variables encoding group membership. Two-way ANOVA is the same as regressing on dummies for both factors plus their product terms. Modern statistical packages use the regression formulation as the unifying frame; ANOVA tables are a presentation choice, not a separate methodology.

Exercises

ExerciseCore

Problem

Three groups of n1=n2=n3=10n_1 = n_2 = n_3 = 10 observations are drawn iid from N(μk,σ2)N(\mu_k, \sigma^2). The group means are Yˉ1=5\bar Y_1 = 5, Yˉ2=6\bar Y_2 = 6, Yˉ3=8\bar Y_3 = 8, the within-group sample variances are S12=S22=S32=4S_1^2 = S_2^2 = S_3^2 = 4. Compute the FF statistic and state its degrees of freedom under the equal-means null.

ExerciseCore

Problem

For K=4K = 4 groups, find the Bonferroni-corrected significance level for pairwise comparisons such that the family-wise error rate is at most 0.050.05. How much smaller is this than 0.050.05?

ExerciseAdvanced

Problem

Show that for K=2K = 2 groups with n1=n2=nn_1 = n_2 = n, the ANOVA FF statistic equals the square of the two-sample equal-variance tt statistic, with F1,2n2F_{1, 2n - 2} matching t2n22t_{2n - 2}^2.

References

Canonical:

  • Casella and Berger, Statistical Inference (2002), 2nd edition, Chapter 11
  • Lehmann and Romano, Testing Statistical Hypotheses (2005), 3rd edition, Chapter 7
  • Scheffe, The Analysis of Variance (1959). The original monograph; still the deepest reference on the geometry.

Foundational papers:

  • Fisher, "Statistical Methods for Research Workers" (1925) introduced the technique; Chapter 7.
  • Welch, "On the comparison of several mean values: an alternative approach" (Biometrika, 1951), volume 38, pages 330-336
  • Tukey, "The problem of multiple comparisons" (unpublished manuscript, 1953; later in The Collected Works of John W. Tukey, Volume VIII)
  • Bonferroni, "Teoria statistica delle classi e calcolo delle probabilita" (1936). The union-bound correction.

Applied references:

  • Box, Hunter, and Hunter, Statistics for Experimenters (2005), 2nd edition. The design-of-experiments perspective.
  • Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning (2009), 2nd edition, Section 3.2 (ANOVA-as-regression).
  • Hsu, Multiple Comparisons: Theory and Methods (1996). The post-hoc-procedure reference.

Next Topics

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

5

Derived topics

0

No published topic currently declares this as a prerequisite.