Distributions Atlas

Sneiderman, Robby

Foundations

Distributions Atlas

A connection map for the parametric families used in statistical inference and ML. Lists each family with its support, parameterization, and the transformations that move between families: sum, mixture, conjugacy, limiting case, and ratio constructions.

CoreTier 1StableCore spine~35 min

Prerequisites

Common Probability Distributions Random Variables Moment Generating Functions

Prereq Map

Why This Matters

The named distributions are not a list. They are a graph. A Bernoulli trial summed $n$ times is a Binomial; a Binomial with $n$ large and mean held fixed is a Poisson; the time between Poisson events is an Exponential; the sum of $k$ independent Exponentials with the same rate is a Gamma; a Gamma with shape $k/2$ and rate $1/2$ is a Chi-squared with $k$ degrees of freedom; a standard Normal divided by a square-rooted Chi-squared is a Student-t; the ratio of two scaled Chi-squareds is an F.

Knowing these edges turns a long memorization task into a short one. The sampling distribution of the standardized sample mean is Student-t because the numerator is Normal, the denominator squared is Chi-squared, and they are independent. The Pearson Chi-squared statistic has its name because under the null hypothesis it converges to a sum of squared Normals. The F statistic in analysis of variance is a ratio of two variance estimates, and each variance estimate is a scaled Chi-squared. Each new test follows from the same construction: identify the building blocks, identify the transformation, read off the limiting distribution.

This page is the atlas. It states each connection precisely once and links to the per-distribution page that proves it.

The Atlas

Discrete families

Family	Notation	Support	Mean	Variance
Bernoulli	$\operatorname{Bern}(p)$	$\{0,1\}$	$p$	$p(1-p)$
Binomial	$\operatorname{Bin}(n,p)$	$\{0,1,\dots,n\}$	$np$	$np(1-p)$
Geometric	$\operatorname{Geom}(p)$	$\{1,2,\dots\}$	$1/p$	$(1-p)/p^2$
Negative Binomial	$\operatorname{NB}(r,p)$	$\{r,r+1,\dots\}$	$r/p$	$r(1-p)/p^2$
Poisson	$\operatorname{Pois}(\lambda)$	$\{0,1,2,\dots\}$	$\lambda$	$\lambda$
Hypergeometric	$\operatorname{HG}(N,K,n)$	$\{0,\dots,\min(K,n)\}$	$nK/N$	$n(K/N)(1-K/N)(N-n)/(N-1)$

Continuous families

Family	Notation	Support	Mean	Variance
Uniform	$\operatorname{Unif}(a,b)$	$[a,b]$	$(a+b)/2$	$(b-a)^2/12$
Normal	$\mathcal{N}(\mu,\sigma^2)$	$\mathbb{R}$	$\mu$	$\sigma^2$
Exponential	$\operatorname{Exp}(\lambda)$	$[0,\infty)$	$1/\lambda$	$1/\lambda^2$
Gamma (rate)	$\operatorname{Gamma}(\alpha,\beta)$	$[0,\infty)$	$\alpha/\beta$	$\alpha/\beta^2$
Beta	$\operatorname{Beta}(\alpha,\beta)$	$[0,1]$	$\alpha/(\alpha+\beta)$	$\alpha\beta/[(\alpha+\beta)^2(\alpha+\beta+1)]$
Chi-squared	$\chi^2_k$	$[0,\infty)$	$k$	$2k$
Student-t	$t_\nu$	$\mathbb{R}$	$0$ for $\nu>1$	$\nu/(\nu-2)$ for $\nu>2$
F	$F_{d_1,d_2}$	$[0,\infty)$	$d_2/(d_2-2)$ for $d_2>2$	see Casella-Berger 5.3
Lognormal	$\operatorname{LN}(\mu,\sigma^2)$	$(0,\infty)$	$e^{\mu+\sigma^2/2}$	$(e^{\sigma^2}-1)e^{2\mu+\sigma^2}$
Pareto	$\operatorname{Par}(\alpha,x_m)$	$[x_m,\infty)$	$\alpha x_m/(\alpha-1)$ for $\alpha>1$	see Casella-Berger 3.3

The variance entries with degree-of-freedom restrictions ( $\nu>2$ , $d_2>2$ , $\alpha>1$ ) reflect the fact that those distributions have polynomially decaying tails, so low-order moments only exist past a threshold.

The Connection Graph

Each row of this table is a named transformation. The "Direction" column reads from the building block to the result.

Direction	Construction	Result	Why it matters
Sum of $n$ i.i.d. $\operatorname{Bern}(p)$	$S_n = \sum_{i=1}^n X_i$	$\operatorname{Bin}(n,p)$	The count of successes in fixed $n$ trials.
Geometric sum of $r$ i.i.d.	sum of $r$ i.i.d. $\operatorname{Geom}(p)$	$\operatorname{NB}(r,p)$	Trials until the $r$ -th success.
Binomial rare-event limit	$n\to\infty$ , $p\to 0$ , $np\to\lambda$	$\operatorname{Pois}(\lambda)$	Rare counts in large pools; defects, mutations, queue arrivals.
Poisson process inter-arrivals	gaps between Poisson event times	$\operatorname{Exp}(\lambda)$	Time-between-events under memoryless arrivals.
Sum of $k$ i.i.d. $\operatorname{Exp}(\lambda)$	$T_k = \sum_{i=1}^k Y_i$	$\operatorname{Gamma}(k,\lambda)$	Time-to- $k$ -th-event in a Poisson process.
Gamma with shape $k/2$ , rate $1/2$	$\operatorname{Gamma}(k/2, 1/2)$	$\chi^2_k$	Identification: Chi-squared $=$ a specific Gamma.
Standard Normal squared	$Z^2$ for $Z\sim\mathcal{N}(0,1)$	$\chi^2_1$	The simplest Chi-squared random variable.
Sum of $k$ i.i.d. squared standard Normals	$\sum_{i=1}^k Z_i^2$	$\chi^2_k$	The sample variance up to a scaling factor.
Ratio of standard Normal and root-Chi	$Z/\sqrt{V/k}$ with $Z\perp V\sim\chi^2_k$	$t_k$	The standardized sample mean has this form.
Ratio of two scaled Chi-squareds	$(V_1/d_1)/(V_2/d_2)$ with $V_i\sim\chi^2_{d_i}$ independent	$F_{d_1,d_2}$	The F statistic for variance comparison and ANOVA.
Order statistic of $\operatorname{Unif}(0,1)$	$U_{(k)}$ from $n$ i.i.d. uniforms	$\operatorname{Beta}(k,n-k+1)$	Beta arises geometrically before it arises as a prior.
Logarithm of Lognormal	$\log X$ for $X\sim\operatorname{LN}(\mu,\sigma^2)$	$\mathcal{N}(\mu,\sigma^2)$	Defines Lognormal. Multiplicative noise becomes additive after the log.
Exponentiated Pareto tail	$\log(X/x_m)$ for $X\sim\operatorname{Par}(\alpha,x_m)$	$\operatorname{Exp}(\alpha)$	The log-Pareto is an Exponential. Heavy tails become light after a log.

Theorem

Sum of i.i.d. Exponentials is Gamma

Statement

If $Y_1,\dots,Y_k$ are independent $\operatorname{Exp}(\lambda)$ random variables, then $T_k = Y_1 + \cdots + Y_k \sim \operatorname{Gamma}(k,\lambda),$ with density $f_{T_k}(t) = \frac{\lambda^k t^{k-1} e^{-\lambda t}}{(k-1)!}, \qquad t\ge 0.$

Intuition

A Poisson process with rate $\lambda$ has Exponential inter-arrival times. The time of the $k$ -th event is the sum of the first $k$ gaps. Each gap contributes a factor of $\lambda$ to the density and one power of $t$ to the polynomial term; the factorial is the volume of the ordered-arrival simplex.

Proof Sketch

By induction or by MGF. The MGF of $\operatorname{Exp}(\lambda)$ is $\lambda/(\lambda-s)$ for $s<\lambda$ . By independence the MGF of $T_k$ is $[\lambda/(\lambda-s)]^k$ , which is the MGF of $\operatorname{Gamma}(k,\lambda)$ . MGF uniqueness identifies the distribution.

Why It Matters

Every Chi-squared random variable is a sum of squared Normals, hence a Gamma with half-integer shape. Every F is a ratio of two Chi-squareds, hence built from Gammas. The Gamma family is the load-bearing wall behind half of the classical sampling distributions.

Failure Mode

The result requires independence and a common rate. Sums of Exponentials with different rates do not give a Gamma; they give a hypoexponential distribution, which has a different density.

report a correction →

Theorem

Ratio of Normal and Root Chi-squared is Student-t

Statement

Let $Z\sim\mathcal{N}(0,1)$ and $V\sim\chi^2_k$ be independent. Then $T = \frac{Z}{\sqrt{V/k}} \sim t_k,$ with density $f_T(t) = \frac{\Gamma((k+1)/2)}{\sqrt{k\pi}\,\Gamma(k/2)}\left(1+\frac{t^2}{k}\right)^{-(k+1)/2}.$

Intuition

The numerator is the unit-variance source of randomness. The denominator is a scale estimate, normalized so that $V/k\to 1$ as $k\to\infty$ . Dividing $Z$ by a noisy estimate of the unit scale inflates the tails by a polynomial amount; the heavier the noise (smaller $k$ ), the heavier the tails.

Proof Sketch

Joint density of $(Z,V)$ factors by independence. Change variables to $T = Z/\sqrt{V/k}$ and $V$ , integrate out $V$ . The integrand is a Gamma in $V$ once collected, so the integral evaluates by the Gamma normalizing constant.

Why It Matters

The sample mean $\bar X$ of an i.i.d. Normal sample has numerator $(\bar X-\mu)/(\sigma/\sqrt n)$ that is standard Normal, and denominator $S/\sigma$ with $S^2$ proportional to a Chi-squared. The standardized statistic $(\bar X-\mu)/(S/\sqrt n)$ is therefore exactly $t_{n-1}$ . This is the one-sample t-test in one line.

Failure Mode

Independence of $Z$ and $V$ is essential. In the t-test the relevant $Z$ is built from $\bar X-\mu$ and the relevant $V$ is built from $\sum(X_i-\bar X)^2$ ; their independence is a special fact about Normal samples and Cochran's theorem, not a generic phenomenon.

report a correction →

Conjugate Priors

A prior on $\theta$ is conjugate to a likelihood $p(x|\theta)$ when the posterior is in the same parametric family as the prior. For the named families the conjugate pairs are short:

Likelihood	Conjugate prior	Posterior update
$\operatorname{Bern}(p)$ or $\operatorname{Bin}(n,p)$	$\operatorname{Beta}(\alpha,\beta)$	$\alpha\leftarrow\alpha+\text{successes}$ , $\beta\leftarrow\beta+\text{failures}$
$\operatorname{Cat}(p_1,\dots,p_K)$ or $\operatorname{Mult}(n,p)$	$\operatorname{Dir}(\alpha_1,\dots,\alpha_K)$	$\alpha_k\leftarrow\alpha_k+\text{count of category }k$
$\operatorname{Pois}(\lambda)$	$\operatorname{Gamma}(\alpha,\beta)$	$\alpha\leftarrow\alpha+\sum x_i$ , $\beta\leftarrow\beta+n$
$\operatorname{Exp}(\lambda)$	$\operatorname{Gamma}(\alpha,\beta)$	$\alpha\leftarrow\alpha+n$ , $\beta\leftarrow\beta+\sum x_i$
$\mathcal{N}(\mu,\sigma^2)$ with $\sigma^2$ known	$\mathcal{N}(\mu_0,\tau_0^2)$	precision-weighted average of $\mu_0$ and $\bar x$
$\mathcal{N}(\mu,\sigma^2)$ with $\mu$ known	$\operatorname{InvGamma}(\alpha,\beta)$	$\alpha\leftarrow\alpha+n/2$ , $\beta\leftarrow\beta+\tfrac12\sum(x_i-\mu)^2$
$\mathcal{N}(\mu,\sigma^2)$ both unknown	Normal-Inverse-Gamma	joint update on $(\mu,\sigma^2)$

The Bernoulli-Beta and Poisson-Gamma pairs are derived in their respective pages; the Normal pairs are derived in Bayesian estimation.

Three Recurring Tricks

Trick 1: Sum identifies via MGF

If $X$ and $Y$ are independent and you want the law of $X+Y$ , compute the MGF of $X+Y = M_X(s)M_Y(s)$ and match it to a known MGF. This is how sum-of-Exponentials-is-Gamma is proved, how sum-of-Normals-is-Normal is proved, and how Binomial-additivity is proved. The MGF table in moment generating functions is the lookup index.

Trick 2: Limit identifies via characteristic-function convergence

If you want the limiting law of a sequence $X_n$ , compute the characteristic function $\varphi_{X_n}(t)$ and take the limit. This is how Binomial-to-Poisson is proved (Poisson's theorem), how the central limit theorem is proved, and how $t_\nu\to\mathcal{N}(0,1)$ as $\nu\to\infty$ is proved.

Trick 3: Transformation identifies via change of variables

If $Y = g(X)$ for a smooth invertible $g$ , then $f_Y(y) = f_X(g^{-1}(y))|(g^{-1})'(y)|$ . This is how Lognormal is derived from Normal, how the Pareto-log-Exponential connection is verified, how the Student-t density is computed from the joint $(Z,V)$ density, and how the F density is computed from the joint $(V_1,V_2)$ density.

Two Common Confusions

Watch Out

Rate versus scale parameterization

Exponential and Gamma both have two conventions. Rate uses $\lambda$ with density $\lambda e^{-\lambda x}$ and mean $1/\lambda$ . Scale uses $\theta = 1/\lambda$ with density $e^{-x/\theta}/\theta$ and mean $\theta$ . SciPy and many engineering texts default to scale; mathematical-statistics texts default to rate. Always check which parameterization a software library uses before plugging in. The pages in this atlas use rate by default and note where scale is more natural.

Watch Out

Chi-squared as a sampling distribution versus a concentration class

The Chi-squared distribution in this atlas is the exact sampling distribution of $\sum Z_i^2$ . The phrase "Chi-squared concentration" elsewhere on the site refers to the Laurent-Massart sub-Gamma tail bounds for Chi-squared random variables, which is a finite-sample inequality, not a description of the law. See chi-squared concentration for the bound; see chi-squared distribution and tests for the law and its uses.

How to Use This Atlas

Pick a target distribution; read down the connection-graph table; follow the edges to the building blocks. Each distribution page derives its connections from this atlas and links back here.

Want to know why the sample variance has a Chi-squared distribution? Read chi-squared distribution and tests.
Want to derive the t-statistic from first principles? Read Student-t distribution and t-test.
Want the F statistic for ANOVA? Read F distribution and ANOVA.
Want a Bayesian update for a Bernoulli sequence? Read beta distribution.
Want the time-to-failure model for a Poisson process? Read exponential distribution and gamma distribution.

Exercises

ExerciseCore

Problem

Let $X_1,\dots,X_n$ be i.i.d. $\operatorname{Exp}(\lambda)$ with $n=10$ and $\lambda = 2$ . Identify the distribution of $S = \sum_{i=1}^{10} X_i$ and compute $\mathbb{E}[S]$ and $\operatorname{Var}(S)$ .

ExerciseCore

Problem

A Poisson process with rate $\lambda = 3$ events per minute is observed. Let $Y$ be the time, in minutes, of the third event. Identify the distribution of $Y$ and compute $\mathbb{P}(Y > 2)$ in terms of an incomplete Gamma function.

ExerciseAdvanced

Problem

Let $Z\sim\mathcal{N}(0,1)$ and $V\sim\chi^2_k$ be independent, and let $T = Z/\sqrt{V/k}$ . Show that as $k\to\infty$ , the distribution of $T$ converges to $\mathcal{N}(0,1)$ .

References

Canonical:

Casella and Berger, Statistical Inference (2002), Chapters 3 and 5.
Bickel and Doksum, Mathematical Statistics, Volume I (2015), Chapter 1.
Johnson, Kotz, and Balakrishnan, Continuous Univariate Distributions, Volumes 1 and 2 (1994 and 1995).
Johnson, Kemp, and Kotz, Univariate Discrete Distributions (2005).

Probability foundations:

Blitzstein and Hwang, Introduction to Probability (2019), Chapters 3 through 8.
Durrett, Probability: Theory and Examples (2019), Chapters 2 and 3.
Grimmett and Stirzaker, Probability and Random Processes (2020), Chapters 3 through 6.

Bayesian framing:

Gelman et al., Bayesian Data Analysis (2013), Chapter 2.
Robert, The Bayesian Choice (2007), Chapter 3.

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Common Probability Distributionslayer 0A · tier 1
Random Variableslayer 0A · tier 1
Moment Generating Functionslayer 0A · tier 2

Derived topics

8

Beta Distributionlayer 0A · tier 1
Exponential Distributionlayer 0A · tier 1
Gamma Distributionlayer 0A · tier 1
Normal Distributionlayer 0A · tier 1
Poisson Distributionlayer 0A · tier 1

+3 more on the derived-topics page.

Graph-backed continuations

Normal Distribution Exponential Distribution Gamma Distribution Beta Distribution Poisson Distribution Chi-Squared Distribution and Tests Student-t Distribution and t-Test F-Distribution and ANOVA