Statistical Estimation
Student-t Distribution and t-Test
The Student-t distribution as a ratio of a standard Normal and a root-Chi-squared, and the one-sample, two-sample, and paired t-tests it powers: exact null distribution under Normality, Welch correction for unequal variances, and large-sample equivalence to the Wald z-test.
Prerequisites
Why This Matters
The Student-t distribution is the exact sampling distribution of the standardized sample mean from an i.i.d. Normal sample with unknown variance. That single fact powers the most-used parametric test in statistics: the t-test. Three flavors of the t-test all rest on the same exact sampling-distribution result, with different choices of numerator and denominator:
- One-sample t-test. Compares a sample mean to a hypothesized value. The statistic is , exactly under Normality and the null.
- Two-sample t-test, equal variances. Compares two independent sample means with a pooled variance estimate. The statistic is , exactly under common-variance Normality and the null.
- Welch's t-test. Compares two independent sample means without the equal-variance assumption. The statistic is approximately with Welch-Satterthwaite degrees of freedom .
- Paired t-test. Reduces a paired comparison to a one-sample t-test on the differences.
All four are exact (or near-exact) under Normality. The large-sample behavior is the Wald z-test: as degrees of freedom grow, , and the t-test merges into the asymptotic normal test that the central limit theorem produces.
The Student-t Distribution
Student-t Distribution
A random variable has a Student-t distribution with degrees of freedom if its density is
The distribution is symmetric about zero. For the mean is zero. For the variance is . The MGF is infinite for every nonzero ; the distribution is heavy-tailed with polynomial tail decay of order .
The parameter controls tail weight. Small gives very heavy tails (Cauchy at , infinite variance for ). As the Student-t converges to the standard Normal; for the two are nearly indistinguishable in the body of the distribution.
Student-t Construction
Student-t as Ratio of Normal and Root Chi-squared
Statement
Let and be independent. Then
Intuition
is the source of unit-variance Normal noise. is an empirical estimate of unit variance (since ); it converges to as . Dividing by a noisy estimate of the scale inflates the tails by a polynomial amount. The thicker the noise (smaller ), the heavier the tails of .
Proof Sketch
Joint density of factors by independence: . Change variables to with and Jacobian . After substitution, the joint density of is proportional to . Integrating over uses the Gamma normalizing constant and yields the stated density.
Why It Matters
The sample mean of an i.i.d. Normal sample has numerator that is standard Normal, and denominator where is (i.e., a Chi-squared divided by its degrees of freedom). Independence of and for Normal samples (see normal distribution) is what makes the standardized statistic exactly rather than an arbitrary ratio.
Failure Mode
Independence of and is essential. In the t-test the relevant is and the relevant is ; their independence is a consequence of Cochran's theorem applied to Normal samples. For non-Normal samples, and are asymptotically uncorrelated but not independent, so the t-distribution is exact only under Normality. Outside Normality, the test is asymptotic, and its accuracy in moderate samples depends on tail weight and skewness.
One-Sample t-Test
One-Sample t-Test
Statement
To test against , use Under , exactly. The two-sided test rejects at level when , the quantile of . The one-sided tests against or use the corresponding tail.
Intuition
is a standardized sample mean. Under the null, the numerator has standard error , so is standard Normal. The denominator divides by the sample estimate instead of the true . The ratio is Normal divided by the root of a normalized Chi-squared, which is exactly Student-t with degrees of freedom.
Proof Sketch
Under Normality, and , with the two independent (see normal distribution). The statistic is which is by the construction theorem.
Why It Matters
The one-sample t-test is the basic parametric test for a sample mean. It is the test you reach for when you want to know whether a sample mean differs from a fixed reference value. The 95% confidence interval is the inverted test region. Both the test and the interval are exact under Normality and asymptotically valid (with size and correct coverage) under any distribution with finite variance, by the central limit theorem combined with the asymptotic equivalence of and the standard Normal as .
Failure Mode
The exact distribution requires Normal data. With heavy-tailed data, the t-statistic has heavier tails than predicts, and rejection rates are inflated above the nominal level. With skewed data and small samples, the test is biased in the direction of the longer tail. Permutation tests (see permutation tests) are the distribution-free alternative.
Two-Sample t-Test, Equal Variances
Two-Sample t-Test with Common Variance
Statement
To test , define the pooled variance and the statistic Under and common variance, exactly.
Intuition
The pooled variance averages the two sample variances, weighted by their degrees of freedom. Under the common-variance assumption, , and the difference of sample means is independent of . The standardized difference is therefore Normal-over-root-Chi-squared with the pooled degrees of freedom.
Proof Sketch
Under common variance, . The two sample variances scaled by are independent Chi-squareds with and degrees of freedom. Their sum scaled by is by Chi-squared additivity. The numerator of standardized by is standard Normal; the denominator is , a root of a normalized Chi-squared. Independence of mean and variance for Normal samples extends to the pooled estimate.
Why It Matters
This is the canonical parametric test for "did treatment A change the mean compared to treatment B" under the simplifying assumption that the two groups share a common variance. It is the test the original Gosset paper introduced under the "Student" pseudonym (Biometrika, 1908). The same procedure gives a confidence interval for by inversion.
Failure Mode
The common-variance assumption matters. With unequal variances and unequal sample sizes, the pooled t-test has the wrong size: rejection rates can be much higher or lower than nominal, depending on which group has more variance and more data. The Welch t-test below is the right replacement and is the default in modern software.
Welch's t-Test
Welch t-Test for Unequal Variances
Statement
The Welch statistic is and is approximately under the null with Welch-Satterthwaite degrees of freedom
Intuition
The denominator uses each group's own variance estimate. The price for not pooling is that the denominator is not a multiple of a single Chi-squared, so the statistic is not exactly t-distributed. The Welch-Satterthwaite approximation matches the first two moments of the denominator squared to those of a scaled Chi-squared, and the resulting degrees of freedom is the moment-matched degrees of freedom. The approximation is accurate even at moderate sample sizes when the variance ratio is far from one.
Proof Sketch
The denominator squared is a linear combination of two independent scaled Chi-squareds, and . Satterthwaite's approximation matches its first two moments to those of for some . The matching gives and the stated formula for (with sample variances substituted for population variances).
Why It Matters
Welch's test is the default two-sample t-test in R (t.test(...) without var.equal = TRUE), SciPy (stats.ttest_ind(..., equal_var=False)), and most modern statistical software. Use it unless you have a specific reason to believe the variances are equal. The cost over the equal-variance pooled test is a fractional reduction in degrees of freedom, which is negligible at moderate sample sizes.
Failure Mode
Welch's test still assumes Normal data within each group, although the approximation degrades more gracefully under non-Normality than the exact pooled test. For heavy-tailed or strongly skewed data, prefer a permutation test or a rank-based test (Mann-Whitney). The Welch-Satterthwaite degrees of freedom can be a non-integer; software interpolates the Chi-squared CDF.
Paired t-Test
Paired t-Test
Statement
To test for pairs with , compute Under and Normality of the differences, exactly.
Intuition
A paired sample reduces to a one-sample t-test on the within-pair differences. The pairing eliminates between-subject variability and increases power compared to a two-sample test on the raw values, provided the pairs are genuinely linked (same subject before and after, matched pairs, twins).
Proof Sketch
The differences are i.i.d. Normal under the assumption. Apply the one-sample t-test theorem to with null mean zero.
Why It Matters
Pre-versus-post designs, twin studies, matched-pair clinical trials, and within-subject crossover trials all use the paired t-test. The power gain over a two-sample test is substantial when the within-pair correlation is high; the test exploits that correlation by subtracting out the shared subject-level baseline. Ignoring pairing and using a two-sample test on the raw values gives a correct level (the test is still valid) but lower power.
Failure Mode
The Normality assumption on the differences is what is required, not Normality of each marginal. Two heavily skewed marginals can have nearly Normal differences if the pairing captures the right shared structure; conversely, two Normal marginals can have non-Normal differences if the pairing is loose. Plot the differences before assuming Normality; if heavy-tailed, use a sign test or Wilcoxon signed-rank test.
When to Use Each Test
| Setting | Test | Reference distribution | Degrees of freedom |
|---|---|---|---|
| One sample, fixed null mean | One-sample t-test | ||
| Two independent samples, equal variances | Pooled two-sample t-test | ||
| Two independent samples, unequal variances | Welch t-test | approximately | Welch-Satterthwaite |
| Paired or matched samples | Paired t-test on differences | ||
| Two samples, large , no Normality assumption | Wald z-test | none | |
| Two samples, heavy-tailed or small | Permutation test | empirical | none |
The Wald z-test is the asymptotic version: replace the t-distribution with the standard Normal. For or , the t and z critical values agree to within about 5%; for they agree to within 1%. See likelihood-ratio, Wald, and score tests.
Confidence Intervals
The one-sample t-test inverts to a confidence interval: The two-sample (pooled) and Welch intervals follow the same pattern: take the estimate plus or minus the t-quantile times the standard error. The standard error is in the pooled case and in the Welch case. The paired interval uses with degrees of freedom.
Common Confusions
The t-distribution is heavy-tailed even at modest degrees of freedom
For , the t-distribution has visibly heavier tails than the Normal: the 97.5% quantile is instead of . The "t is approximately Normal" intuition holds only for or so. With small samples, use the t-quantile, not the Normal quantile, even when the underlying data look Normal.
Welch versus equal-variance: prefer Welch by default
The cost of using Welch when variances are actually equal is small (a slight loss of power, typically less than 1 percentage point). The cost of using the equal-variance test when variances are unequal can be large: actual rejection rates can be 10% or 20% under a nominal 5% test, depending on the variance ratio and sample-size imbalance.
A high p-value is not evidence of equal means
Failing to reject means the data are consistent with equal means. It does not mean the means are equal. The confidence interval is the right summary: a 95% interval centered near zero with narrow width indicates "we have ruled out large differences"; a wide interval indicates "we have ruled out very little".
The t-test assumes the population variance is unknown
If the population variance is known (rare in practice, common in textbook problems), the appropriate test is the z-test with in the denominator, not the t-test with . The z-test rejects when at the 5% level. The t-test adds an extra source of variability (the sample variance), which is why its critical value is larger.
Exercises
Problem
A new manufacturing process produces parts with target length 50 mm. A sample of parts gives mm and mm. Test against at level 0.05.
Problem
Two independent samples have with and , and with and . Use Welch's t-test to test at level 0.05.
Problem
A paired study measures cholesterol level for 8 patients before and after a diet. The differences (after minus before, in mg/dL) are . Test whether the diet reduces cholesterol at level 0.05.
Problem
Show that as , the Welch t-test converges to the Wald z-test, and that the difference between the t-quantile and the z-quantile is for fixed quantile level.
References
Canonical:
- Casella and Berger, Statistical Inference (2002), Chapter 5 (Section 5.3 on the Student-t sampling distribution), Chapter 8 (Section 8.2 on the t-test).
- Lehmann and Romano, Testing Statistical Hypotheses (2005), Chapter 5 (UMP-unbiased tests in the Normal family, including the t-test as a UMP-invariant test).
- Bickel and Doksum, Mathematical Statistics, Volume I (2015), Chapter 4 (testing in the Normal model).
Historical:
- Student (W. S. Gosset), "The probable error of a mean" (Biometrika, 1908), the original t-distribution paper.
- Welch, "The generalization of Student's problem when several different population variances are involved" (Biometrika, 1947), Welch-Satterthwaite degrees of freedom.
Rank-based and resampling alternatives:
- Wilcoxon, "Individual comparisons by ranking methods" (Biometrics Bulletin, 1945), the rank-based alternative.
- Davison and Hinkley, Bootstrap Methods and Their Application (1997), Chapter 4 (bootstrap and permutation alternatives to the t-test).
Last reviewed: May 11, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
5- Distributions Atlaslayer 0A · tier 1
- Normal Distributionlayer 0A · tier 1
- Central Limit Theoremlayer 0B · tier 1
- Chi-Squared Distribution and Testslayer 1 · tier 1
- Hypothesis Testing for MLlayer 2 · tier 2
Derived topics
1- F-Distribution and ANOVAlayer 1 · tier 1
Graph-backed continuations