Sample Size Determination

Sneiderman, Robby

Statistical Foundations

Sample Size Determination

How to compute the number of observations needed to estimate means, proportions, and treatment effects with specified precision and power, including corrections for finite populations and complex designs.

CoreTier 2StableSupporting~45 min

Prerequisites

Hypothesis Testing for ML Common Probability Distributions Survey Sampling Methods

Start 8-question practice · 5 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

statistical-foundations | layer 2 | tier 2. This page has 3 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Survey Sampling Methods

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Before collecting data, you need to know how much to collect. Too little and your estimates are imprecise or your test has no power. Too much and you waste resources. Sample size determination answers the question: what is the minimum $n$ that achieves the precision or power you need?

This is not just a survey statistics problem. A/B tests, clinical trials, and ML evaluation benchmarks all require sample size calculations. If you run an A/B test with too few users, you will fail to detect real effects. If you evaluate a model on too few test points, your accuracy estimate has wide confidence intervals.

Mental Model

Sample size formulas all follow the same logic. You specify how precise you want your estimate to be (the margin of error $e$ ) or how small an effect you want to detect (the effect size $\delta$ ). You specify a confidence level or significance level. Then you solve for $n$ .

The key tradeoff: smaller margin of error requires larger $n$ , and $n$ grows as $1/e^2$ . Halving your margin of error quadruples your sample size.

Formal Setup

Sample Size for Estimating a Mean

Definition

Sample Size for a Mean $n$

To estimate a population mean $\mu$ with margin of error $e$ at confidence level $1 - \alpha$ , using an SRS from an infinite (or very large) population:

$n = \frac{z_{\alpha/2}^2 \, \sigma^2}{e^2}$

where $z_{\alpha/2}$ is the upper $\alpha/2$ quantile of the standard normal and $\sigma^2$ is the population variance.

For 95% confidence ( $z_{0.025} = 1.96$ ), this simplifies to $n \approx 4\sigma^2/e^2$ .

The formula comes directly from the requirement that the half-width of the confidence interval for $\mu$ be at most $e$ : we need $z_{\alpha/2} \cdot \sigma/\sqrt{n} \leq e$ .

Sample Size for Estimating a Proportion

Definition

Sample Size for a Proportion $n$

To estimate a population proportion $p$ with margin of error $e$ at confidence level $1 - \alpha$ :

$n = \frac{z_{\alpha/2}^2 \, p(1-p)}{e^2}$

Since $p$ is unknown before the study, use the conservative choice $p = 0.5$ , which maximizes $p(1-p) = 0.25$ . This gives the largest (most conservative) sample size:

$n_{\max} = \frac{z_{\alpha/2}^2}{4e^2}$

For 95% confidence and $e = 0.03$ : $n_{\max} = 1.96^2/(4 \times 0.0009) \approx 1068$ .

Finite Population Correction

Definition

Finite Population Correction (FPC)

When the population size $N$ is not much larger than the sample size, apply the finite population correction:

$n_{\text{adj}} = \frac{n}{1 + (n - 1)/N}$

where $n$ is the sample size from the infinite-population formula. This is always smaller than $n$ . When $N$ is large relative to $n$ , the correction is negligible.

Design Effect Adjustment

For complex sampling designs (stratified, cluster, multi-stage), the effective sample size differs from the nominal sample size. Multiply the SRS-based sample size by the design effect:

$n_{\text{complex}} = n_{\text{SRS}} \times \text{DEFF}$

A cluster design with DEFF = 2 requires twice as many observations as SRS to achieve the same precision.

Power Analysis

Definition

Power of a Test

The power of a hypothesis test is $1 - \beta$ , where $\beta$ is the probability of a Type II error (failing to reject the null when the alternative is true). Power depends on the sample size, the significance level $\alpha$ , the effect size, and the population variance.

Sample Size for a Two-Sample t-Test

To detect a difference $\delta = \mu_1 - \mu_2$ between two group means with significance level $\alpha$ and power $1 - \beta$ , assuming equal variances $\sigma^2$ and equal group sizes:

$n_{\text{per group}} = \frac{2\sigma^2(z_{\alpha/2} + z_{\beta})^2}{\delta^2}$

For significance level 0.05 and power 0.80:

$n_{\text{per group}} = \frac{2\sigma^2(1.96 + 0.84)^2}{\delta^2} = \frac{15.68 \, \sigma^2}{\delta^2}$

Main Theorems

Theorem

Minimum Sample Size for Mean Estimation

Statement

Let $Y_1, \ldots, Y_n$ be an SRS from a population with mean $\mu$ and variance $\sigma^2$ . For the confidence interval $\bar{Y} \pm z_{\alpha/2} \sigma / \sqrt{n}$ to have half-width at most $e$ , it is necessary and sufficient that:

$n \geq \frac{z_{\alpha/2}^2 \, \sigma^2}{e^2}$

Intuition

The width of the confidence interval shrinks as $1/\sqrt{n}$ . To make the width $e$ , you solve $\sigma/\sqrt{n} = e/z_{\alpha/2}$ for $n$ . The result scales linearly with variance and inversely with the square of the margin of error.

Proof Sketch

The $1-\alpha$ confidence interval for $\mu$ under SRS has half-width $z_{\alpha/2} \sigma/\sqrt{n}$ . Set this equal to $e$ and solve: $z_{\alpha/2} \sigma / \sqrt{n} = e$ implies $\sqrt{n} = z_{\alpha/2}\sigma/e$ implies $n = z_{\alpha/2}^2 \sigma^2/e^2$ .

Why It Matters

This is the foundation for all sample size planning. Every other formula (for proportions, for power analysis, for complex designs) is a variation on this basic calculation. The $1/e^2$ dependence is the key quantitative insight: precision is expensive.

Failure Mode

The formula requires knowing $\sigma^2$ before the study. In practice, $\sigma^2$ must be estimated from pilot data, prior studies, or conservative guesses. If $\sigma^2$ is underestimated, the actual margin of error will exceed $e$ . The normal approximation also requires $n$ to be large enough for the CLT to apply. For highly skewed populations, larger $n$ is needed.

report a correction →

Why "More Data" Has Diminishing Returns

The standard error decreases as $1/\sqrt{n}$ . Going from $n = 100$ to $n = 400$ halves the standard error. Going from $n = 400$ to $n = 1600$ halves it again. Each halving of the standard error requires quadrupling the sample size. At some point, the cost of additional data exceeds the value of the incremental precision.

This is why sample size determination matters: it tells you the point beyond which additional data is not worth collecting.

Common Confusions

Watch Out

Sample size depends on population variance, not population size

A common misconception is that you need to sample a fixed percentage of the population (e.g., 10%). The sample size formula depends on $\sigma^2$ and $e$ , not on $N$ (except through the FPC, which is negligible when $N \gg n$ ). A poll of 1000 people gives roughly the same precision whether the population is 1 million or 300 million.

Watch Out

Power is not the same as significance

Significance level $\alpha$ controls the false positive rate. Power $1 - \beta$ controls the false negative rate. You can have a "significant" result (small $p$ -value) with low power if the effect happened to be large in your sample. Planning for power means ensuring you can detect a specified effect size.

Watch Out

The conservative p = 0.5 is sometimes very conservative

Using $p = 0.5$ for proportion estimation gives the maximum sample size. If you have prior knowledge that $p \approx 0.05$ (a rare event), then $p(1-p) = 0.0475$ , which is 5 times smaller than 0.25. Using $p = 0.5$ would give you 5 times too many samples. Use prior information when available.

Summary

Sample size for means: $n = z^2 \sigma^2 / e^2$ . For proportions: $n = z^2 p(1-p)/e^2$
Halving the margin of error quadruples the required sample size
Power analysis for a two-sample test: $n \propto \sigma^2/\delta^2$
Finite population correction matters only when $n/N$ is not small
Design effects from complex surveys multiply the required sample size
The formula requires prior knowledge of $\sigma^2$ (or $p$ ), which is the hard part

Exercises

ExerciseCore

Problem

You want to estimate the average income in a city to within 2000 USD at 95% confidence. A pilot study suggests the standard deviation is about 15,000. What sample size do you need? If the city has 50,000 residents, how does the FPC change your answer?

ExerciseAdvanced

Problem

You are designing an A/B test. You expect the control group conversion rate to be $p_0 = 0.10$ and want to detect a lift to $p_1 = 0.12$ with 80% power at $\alpha = 0.05$ . How many users per group do you need? What if you want to detect a lift to $p_1 = 0.11$ ?

References

Canonical:

Cochran, Sampling Techniques (1977), Chapter 4
Cohen, Statistical Power Analysis for the Behavioral Sciences (1988), Chapters 2, 6

Current:

Lohr, Sampling: Design and Analysis (2021), Chapter 2
Ryan, Sample Size Determination and Power (2013), Chapters 1-4
Howard, Ramdas, McAuliffe, Sekhon, "Time-uniform, nonparametric, nonasymptotic confidence sequences," Annals of Statistics 49 (2021), 1055-1080

Next Topics

Survey sampling methods: the designs that generate the data
Types of bias in statistics: what goes wrong when sampling is non-random

Last reviewed: April 14, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Common Probability Distributionslayer 0A · tier 1
Hypothesis Testing for MLlayer 2 · tier 2
Survey Sampling Methodslayer 2 · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.