Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Statistical Foundations

Sample Size Determination

How to compute the number of observations needed to estimate means, proportions, and treatment effects with specified precision and power, including corrections for finite populations and complex designs.

CoreTier 2Stable~45 min
0

Why This Matters

Before collecting data, you need to know how much to collect. Too little and your estimates are imprecise or your test has no power. Too much and you waste resources. Sample size determination answers the question: what is the minimum nn that achieves the precision or power you need?

This is not just a survey statistics problem. A/B tests, clinical trials, and ML evaluation benchmarks all require sample size calculations. If you run an A/B test with too few users, you will fail to detect real effects. If you evaluate a model on too few test points, your accuracy estimate has wide confidence intervals.

Mental Model

Sample size formulas all follow the same logic. You specify how precise you want your estimate to be (the margin of error ee) or how small an effect you want to detect (the effect size δ\delta). You specify a confidence level or significance level. Then you solve for nn.

The key tradeoff: smaller margin of error requires larger nn, and nn grows as 1/e21/e^2. Halving your margin of error quadruples your sample size.

Formal Setup

Sample Size for Estimating a Mean

Definition

Sample Size for a Mean

To estimate a population mean μ\mu with margin of error ee at confidence level 1α1 - \alpha, using an SRS from an infinite (or very large) population:

n=zα/22σ2e2n = \frac{z_{\alpha/2}^2 \, \sigma^2}{e^2}

where zα/2z_{\alpha/2} is the upper α/2\alpha/2 quantile of the standard normal and σ2\sigma^2 is the population variance.

For 95% confidence (z0.025=1.96z_{0.025} = 1.96), this simplifies to n4σ2/e2n \approx 4\sigma^2/e^2.

The formula comes directly from the requirement that the half-width of the confidence interval for μ\mu be at most ee: we need zα/2σ/nez_{\alpha/2} \cdot \sigma/\sqrt{n} \leq e.

Sample Size for Estimating a Proportion

Definition

Sample Size for a Proportion

To estimate a population proportion pp with margin of error ee at confidence level 1α1 - \alpha:

n=zα/22p(1p)e2n = \frac{z_{\alpha/2}^2 \, p(1-p)}{e^2}

Since pp is unknown before the study, use the conservative choice p=0.5p = 0.5, which maximizes p(1p)=0.25p(1-p) = 0.25. This gives the largest (most conservative) sample size:

nmax=zα/224e2n_{\max} = \frac{z_{\alpha/2}^2}{4e^2}

For 95% confidence and e=0.03e = 0.03: nmax=1.962/(4×0.0009)1068n_{\max} = 1.96^2/(4 \times 0.0009) \approx 1068.

Finite Population Correction

Definition

Finite Population Correction (FPC)

When the population size NN is not much larger than the sample size, apply the finite population correction:

nadj=n1+(n1)/Nn_{\text{adj}} = \frac{n}{1 + (n - 1)/N}

where nn is the sample size from the infinite-population formula. This is always smaller than nn. When NN is large relative to nn, the correction is negligible.

Design Effect Adjustment

For complex sampling designs (stratified, cluster, multi-stage), the effective sample size differs from the nominal sample size. Multiply the SRS-based sample size by the design effect:

ncomplex=nSRS×DEFFn_{\text{complex}} = n_{\text{SRS}} \times \text{DEFF}

A cluster design with DEFF = 2 requires twice as many observations as SRS to achieve the same precision.

Power Analysis

Definition

Power of a Test

The power of a hypothesis test is 1β1 - \beta, where β\beta is the probability of a Type II error (failing to reject the null when the alternative is true). Power depends on the sample size, the significance level α\alpha, the effect size, and the population variance.

Sample Size for a Two-Sample t-Test

To detect a difference δ=μ1μ2\delta = \mu_1 - \mu_2 between two group means with significance level α\alpha and power 1β1 - \beta, assuming equal variances σ2\sigma^2 and equal group sizes:

nper group=2σ2(zα/2+zβ)2δ2n_{\text{per group}} = \frac{2\sigma^2(z_{\alpha/2} + z_{\beta})^2}{\delta^2}

For significance level 0.05 and power 0.80:

nper group=2σ2(1.96+0.84)2δ2=15.68σ2δ2n_{\text{per group}} = \frac{2\sigma^2(1.96 + 0.84)^2}{\delta^2} = \frac{15.68 \, \sigma^2}{\delta^2}

Main Theorems

Theorem

Minimum Sample Size for Mean Estimation

Statement

Let Y1,,YnY_1, \ldots, Y_n be an SRS from a population with mean μ\mu and variance σ2\sigma^2. For the confidence interval Yˉ±zα/2σ/n\bar{Y} \pm z_{\alpha/2} \sigma / \sqrt{n} to have half-width at most ee, it is necessary and sufficient that:

nzα/22σ2e2n \geq \frac{z_{\alpha/2}^2 \, \sigma^2}{e^2}

Intuition

The width of the confidence interval shrinks as 1/n1/\sqrt{n}. To make the width ee, you solve σ/n=e/zα/2\sigma/\sqrt{n} = e/z_{\alpha/2} for nn. The result scales linearly with variance and inversely with the square of the margin of error.

Proof Sketch

The 1α1-\alpha confidence interval for μ\mu under SRS has half-width zα/2σ/nz_{\alpha/2} \sigma/\sqrt{n}. Set this equal to ee and solve: zα/2σ/n=ez_{\alpha/2} \sigma / \sqrt{n} = e implies n=zα/2σ/e\sqrt{n} = z_{\alpha/2}\sigma/e implies n=zα/22σ2/e2n = z_{\alpha/2}^2 \sigma^2/e^2.

Why It Matters

This is the foundation for all sample size planning. Every other formula (for proportions, for power analysis, for complex designs) is a variation on this basic calculation. The 1/e21/e^2 dependence is the key quantitative insight: precision is expensive.

Failure Mode

The formula requires knowing σ2\sigma^2 before the study. In practice, σ2\sigma^2 must be estimated from pilot data, prior studies, or conservative guesses. If σ2\sigma^2 is underestimated, the actual margin of error will exceed ee. The normal approximation also requires nn to be large enough for the CLT to apply. For highly skewed populations, larger nn is needed.

Why "More Data" Has Diminishing Returns

The standard error decreases as 1/n1/\sqrt{n}. Going from n=100n = 100 to n=400n = 400 halves the standard error. Going from n=400n = 400 to n=1600n = 1600 halves it again. Each halving of the standard error requires quadrupling the sample size. At some point, the cost of additional data exceeds the value of the incremental precision.

This is why sample size determination matters: it tells you the point beyond which additional data is not worth collecting.

Common Confusions

Watch Out

Sample size depends on population variance, not population size

A common misconception is that you need to sample a fixed percentage of the population (e.g., 10%). The sample size formula depends on σ2\sigma^2 and ee, not on NN (except through the FPC, which is negligible when NnN \gg n). A poll of 1000 people gives roughly the same precision whether the population is 1 million or 300 million.

Watch Out

Power is not the same as significance

Significance level α\alpha controls the false positive rate. Power 1β1 - \beta controls the false negative rate. You can have a "significant" result (small pp-value) with low power if the effect happened to be large in your sample. Planning for power means ensuring you can detect a specified effect size.

Watch Out

The conservative p = 0.5 is sometimes very conservative

Using p=0.5p = 0.5 for proportion estimation gives the maximum sample size. If you have prior knowledge that p0.05p \approx 0.05 (a rare event), then p(1p)=0.0475p(1-p) = 0.0475, which is 5 times smaller than 0.25. Using p=0.5p = 0.5 would give you 5 times too many samples. Use prior information when available.

Summary

  • Sample size for means: n=z2σ2/e2n = z^2 \sigma^2 / e^2. For proportions: n=z2p(1p)/e2n = z^2 p(1-p)/e^2
  • Halving the margin of error quadruples the required sample size
  • Power analysis for a two-sample test: nσ2/δ2n \propto \sigma^2/\delta^2
  • Finite population correction matters only when n/Nn/N is not small
  • Design effects from complex surveys multiply the required sample size
  • The formula requires prior knowledge of σ2\sigma^2 (or pp), which is the hard part

Exercises

ExerciseCore

Problem

You want to estimate the average income in a city to within 2000 USD at 95% confidence. A pilot study suggests the standard deviation is about 15,000. What sample size do you need? If the city has 50,000 residents, how does the FPC change your answer?

ExerciseAdvanced

Problem

You are designing an A/B test. You expect the control group conversion rate to be p0=0.10p_0 = 0.10 and want to detect a lift to p1=0.12p_1 = 0.12 with 80% power at α=0.05\alpha = 0.05. How many users per group do you need? What if you want to detect a lift to p1=0.11p_1 = 0.11?

References

Canonical:

  • Cochran, Sampling Techniques (1977), Chapter 4
  • Cohen, Statistical Power Analysis for the Behavioral Sciences (1988), Chapters 2, 6

Current:

  • Lohr, Sampling: Design and Analysis (2021), Chapter 2

  • Ryan, Sample Size Determination and Power (2013), Chapters 1-4

  • Casella & Berger, Statistical Inference (2002), Chapters 5-10

  • Lehmann & Casella, Theory of Point Estimation (1998), Chapters 1-6

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics