Skip to main content

Foundations

Distributions Atlas

A connection map for the parametric families used in statistical inference and ML. Lists each family with its support, parameterization, and the transformations that move between families: sum, mixture, conjugacy, limiting case, and ratio constructions.

CoreTier 1StableCore spine~35 min

Why This Matters

The named distributions are not a list. They are a graph. A Bernoulli trial summed nn times is a Binomial; a Binomial with nn large and mean held fixed is a Poisson; the time between Poisson events is an Exponential; the sum of kk independent Exponentials with the same rate is a Gamma; a Gamma with shape k/2k/2 and rate 1/21/2 is a Chi-squared with kk degrees of freedom; a standard Normal divided by a square-rooted Chi-squared is a Student-t; the ratio of two scaled Chi-squareds is an F.

Knowing these edges turns a long memorization task into a short one. The sampling distribution of the standardized sample mean is Student-t because the numerator is Normal, the denominator squared is Chi-squared, and they are independent. The Pearson Chi-squared statistic has its name because under the null hypothesis it converges to a sum of squared Normals. The F statistic in analysis of variance is a ratio of two variance estimates, and each variance estimate is a scaled Chi-squared. Each new test follows from the same construction: identify the building blocks, identify the transformation, read off the limiting distribution.

This page is the atlas. It states each connection precisely once and links to the per-distribution page that proves it.

The Atlas

Discrete families

FamilyNotationSupportMeanVariance
BernoulliBern(p)\operatorname{Bern}(p){0,1}\{0,1\}ppp(1p)p(1-p)
BinomialBin(n,p)\operatorname{Bin}(n,p){0,1,,n}\{0,1,\dots,n\}npnpnp(1p)np(1-p)
GeometricGeom(p)\operatorname{Geom}(p){1,2,}\{1,2,\dots\}1/p1/p(1p)/p2(1-p)/p^2
Negative BinomialNB(r,p)\operatorname{NB}(r,p){r,r+1,}\{r,r+1,\dots\}r/pr/pr(1p)/p2r(1-p)/p^2
PoissonPois(λ)\operatorname{Pois}(\lambda){0,1,2,}\{0,1,2,\dots\}λ\lambdaλ\lambda
HypergeometricHG(N,K,n)\operatorname{HG}(N,K,n){0,,min(K,n)}\{0,\dots,\min(K,n)\}nK/NnK/Nn(K/N)(1K/N)(Nn)/(N1)n(K/N)(1-K/N)(N-n)/(N-1)

Continuous families

FamilyNotationSupportMeanVariance
UniformUnif(a,b)\operatorname{Unif}(a,b)[a,b][a,b](a+b)/2(a+b)/2(ba)2/12(b-a)^2/12
NormalN(μ,σ2)\mathcal{N}(\mu,\sigma^2)R\mathbb{R}μ\muσ2\sigma^2
ExponentialExp(λ)\operatorname{Exp}(\lambda)[0,)[0,\infty)1/λ1/\lambda1/λ21/\lambda^2
Gamma (rate)Gamma(α,β)\operatorname{Gamma}(\alpha,\beta)[0,)[0,\infty)α/β\alpha/\betaα/β2\alpha/\beta^2
BetaBeta(α,β)\operatorname{Beta}(\alpha,\beta)[0,1][0,1]α/(α+β)\alpha/(\alpha+\beta)αβ/[(α+β)2(α+β+1)]\alpha\beta/[(\alpha+\beta)^2(\alpha+\beta+1)]
Chi-squaredχk2\chi^2_k[0,)[0,\infty)kk2k2k
Student-ttνt_\nuR\mathbb{R}00 for ν>1\nu>1ν/(ν2)\nu/(\nu-2) for ν>2\nu>2
FFd1,d2F_{d_1,d_2}[0,)[0,\infty)d2/(d22)d_2/(d_2-2) for d2>2d_2>2see Casella-Berger 5.3
LognormalLN(μ,σ2)\operatorname{LN}(\mu,\sigma^2)(0,)(0,\infty)eμ+σ2/2e^{\mu+\sigma^2/2}(eσ21)e2μ+σ2(e^{\sigma^2}-1)e^{2\mu+\sigma^2}
ParetoPar(α,xm)\operatorname{Par}(\alpha,x_m)[xm,)[x_m,\infty)αxm/(α1)\alpha x_m/(\alpha-1) for α>1\alpha>1see Casella-Berger 3.3

The variance entries with degree-of-freedom restrictions (ν>2\nu>2, d2>2d_2>2, α>1\alpha>1) reflect the fact that those distributions have polynomially decaying tails, so low-order moments only exist past a threshold.

The Connection Graph

Each row of this table is a named transformation. The "Direction" column reads from the building block to the result.

DirectionConstructionResultWhy it matters
Sum of nn i.i.d. Bern(p)\operatorname{Bern}(p)Sn=i=1nXiS_n = \sum_{i=1}^n X_iBin(n,p)\operatorname{Bin}(n,p)The count of successes in fixed nn trials.
Geometric sum of rr i.i.d.sum of rr i.i.d. Geom(p)\operatorname{Geom}(p)NB(r,p)\operatorname{NB}(r,p)Trials until the rr-th success.
Binomial rare-event limitnn\to\infty, p0p\to 0, npλnp\to\lambdaPois(λ)\operatorname{Pois}(\lambda)Rare counts in large pools; defects, mutations, queue arrivals.
Poisson process inter-arrivalsgaps between Poisson event timesExp(λ)\operatorname{Exp}(\lambda)Time-between-events under memoryless arrivals.
Sum of kk i.i.d. Exp(λ)\operatorname{Exp}(\lambda)Tk=i=1kYiT_k = \sum_{i=1}^k Y_iGamma(k,λ)\operatorname{Gamma}(k,\lambda)Time-to-kk-th-event in a Poisson process.
Gamma with shape k/2k/2, rate 1/21/2Gamma(k/2,1/2)\operatorname{Gamma}(k/2, 1/2)χk2\chi^2_kIdentification: Chi-squared == a specific Gamma.
Standard Normal squaredZ2Z^2 for ZN(0,1)Z\sim\mathcal{N}(0,1)χ12\chi^2_1The simplest Chi-squared random variable.
Sum of kk i.i.d. squared standard Normalsi=1kZi2\sum_{i=1}^k Z_i^2χk2\chi^2_kThe sample variance up to a scaling factor.
Ratio of standard Normal and root-ChiZ/V/kZ/\sqrt{V/k} with ZVχk2Z\perp V\sim\chi^2_ktkt_kThe standardized sample mean has this form.
Ratio of two scaled Chi-squareds(V1/d1)/(V2/d2)(V_1/d_1)/(V_2/d_2) with Viχdi2V_i\sim\chi^2_{d_i} independentFd1,d2F_{d_1,d_2}The F statistic for variance comparison and ANOVA.
Order statistic of Unif(0,1)\operatorname{Unif}(0,1)U(k)U_{(k)} from nn i.i.d. uniformsBeta(k,nk+1)\operatorname{Beta}(k,n-k+1)Beta arises geometrically before it arises as a prior.
Logarithm of LognormallogX\log X for XLN(μ,σ2)X\sim\operatorname{LN}(\mu,\sigma^2)N(μ,σ2)\mathcal{N}(\mu,\sigma^2)Defines Lognormal. Multiplicative noise becomes additive after the log.
Exponentiated Pareto taillog(X/xm)\log(X/x_m) for XPar(α,xm)X\sim\operatorname{Par}(\alpha,x_m)Exp(α)\operatorname{Exp}(\alpha)The log-Pareto is an Exponential. Heavy tails become light after a log.
Theorem

Sum of i.i.d. Exponentials is Gamma

Statement

If Y1,,YkY_1,\dots,Y_k are independent Exp(λ)\operatorname{Exp}(\lambda) random variables, then Tk=Y1++YkGamma(k,λ),T_k = Y_1 + \cdots + Y_k \sim \operatorname{Gamma}(k,\lambda), with density fTk(t)=λktk1eλt(k1)!,t0.f_{T_k}(t) = \frac{\lambda^k t^{k-1} e^{-\lambda t}}{(k-1)!}, \qquad t\ge 0.

Intuition

A Poisson process with rate λ\lambda has Exponential inter-arrival times. The time of the kk-th event is the sum of the first kk gaps. Each gap contributes a factor of λ\lambda to the density and one power of tt to the polynomial term; the factorial is the volume of the ordered-arrival simplex.

Proof Sketch

By induction or by MGF. The MGF of Exp(λ)\operatorname{Exp}(\lambda) is λ/(λs)\lambda/(\lambda-s) for s<λs<\lambda. By independence the MGF of TkT_k is [λ/(λs)]k[\lambda/(\lambda-s)]^k, which is the MGF of Gamma(k,λ)\operatorname{Gamma}(k,\lambda). MGF uniqueness identifies the distribution.

Why It Matters

Every Chi-squared random variable is a sum of squared Normals, hence a Gamma with half-integer shape. Every F is a ratio of two Chi-squareds, hence built from Gammas. The Gamma family is the load-bearing wall behind half of the classical sampling distributions.

Failure Mode

The result requires independence and a common rate. Sums of Exponentials with different rates do not give a Gamma; they give a hypoexponential distribution, which has a different density.

Theorem

Ratio of Normal and Root Chi-squared is Student-t

Statement

Let ZN(0,1)Z\sim\mathcal{N}(0,1) and Vχk2V\sim\chi^2_k be independent. Then T=ZV/ktk,T = \frac{Z}{\sqrt{V/k}} \sim t_k, with density fT(t)=Γ((k+1)/2)kπΓ(k/2)(1+t2k)(k+1)/2.f_T(t) = \frac{\Gamma((k+1)/2)}{\sqrt{k\pi}\,\Gamma(k/2)}\left(1+\frac{t^2}{k}\right)^{-(k+1)/2}.

Intuition

The numerator is the unit-variance source of randomness. The denominator is a scale estimate, normalized so that V/k1V/k\to 1 as kk\to\infty. Dividing ZZ by a noisy estimate of the unit scale inflates the tails by a polynomial amount; the heavier the noise (smaller kk), the heavier the tails.

Proof Sketch

Joint density of (Z,V)(Z,V) factors by independence. Change variables to T=Z/V/kT = Z/\sqrt{V/k} and VV, integrate out VV. The integrand is a Gamma in VV once collected, so the integral evaluates by the Gamma normalizing constant.

Why It Matters

The sample mean Xˉ\bar X of an i.i.d. Normal sample has numerator (Xˉμ)/(σ/n)(\bar X-\mu)/(\sigma/\sqrt n) that is standard Normal, and denominator S/σS/\sigma with S2S^2 proportional to a Chi-squared. The standardized statistic (Xˉμ)/(S/n)(\bar X-\mu)/(S/\sqrt n) is therefore exactly tn1t_{n-1}. This is the one-sample t-test in one line.

Failure Mode

Independence of ZZ and VV is essential. In the t-test the relevant ZZ is built from Xˉμ\bar X-\mu and the relevant VV is built from (XiXˉ)2\sum(X_i-\bar X)^2; their independence is a special fact about Normal samples and Cochran's theorem, not a generic phenomenon.

Conjugate Priors

A prior on θ\theta is conjugate to a likelihood p(xθ)p(x|\theta) when the posterior is in the same parametric family as the prior. For the named families the conjugate pairs are short:

LikelihoodConjugate priorPosterior update
Bern(p)\operatorname{Bern}(p) or Bin(n,p)\operatorname{Bin}(n,p)Beta(α,β)\operatorname{Beta}(\alpha,\beta)αα+successes\alpha\leftarrow\alpha+\text{successes}, ββ+failures\beta\leftarrow\beta+\text{failures}
Cat(p1,,pK)\operatorname{Cat}(p_1,\dots,p_K) or Mult(n,p)\operatorname{Mult}(n,p)Dir(α1,,αK)\operatorname{Dir}(\alpha_1,\dots,\alpha_K)αkαk+count of category k\alpha_k\leftarrow\alpha_k+\text{count of category }k
Pois(λ)\operatorname{Pois}(\lambda)Gamma(α,β)\operatorname{Gamma}(\alpha,\beta)αα+xi\alpha\leftarrow\alpha+\sum x_i, ββ+n\beta\leftarrow\beta+n
Exp(λ)\operatorname{Exp}(\lambda)Gamma(α,β)\operatorname{Gamma}(\alpha,\beta)αα+n\alpha\leftarrow\alpha+n, ββ+xi\beta\leftarrow\beta+\sum x_i
N(μ,σ2)\mathcal{N}(\mu,\sigma^2) with σ2\sigma^2 knownN(μ0,τ02)\mathcal{N}(\mu_0,\tau_0^2)precision-weighted average of μ0\mu_0 and xˉ\bar x
N(μ,σ2)\mathcal{N}(\mu,\sigma^2) with μ\mu knownInvGamma(α,β)\operatorname{InvGamma}(\alpha,\beta)αα+n/2\alpha\leftarrow\alpha+n/2, ββ+12(xiμ)2\beta\leftarrow\beta+\tfrac12\sum(x_i-\mu)^2
N(μ,σ2)\mathcal{N}(\mu,\sigma^2) both unknownNormal-Inverse-Gammajoint update on (μ,σ2)(\mu,\sigma^2)

The Bernoulli-Beta and Poisson-Gamma pairs are derived in their respective pages; the Normal pairs are derived in Bayesian estimation.

Three Recurring Tricks

Trick 1: Sum identifies via MGF

If XX and YY are independent and you want the law of X+YX+Y, compute the MGF of X+Y=MX(s)MY(s)X+Y = M_X(s)M_Y(s) and match it to a known MGF. This is how sum-of-Exponentials-is-Gamma is proved, how sum-of-Normals-is-Normal is proved, and how Binomial-additivity is proved. The MGF table in moment generating functions is the lookup index.

Trick 2: Limit identifies via characteristic-function convergence

If you want the limiting law of a sequence XnX_n, compute the characteristic function φXn(t)\varphi_{X_n}(t) and take the limit. This is how Binomial-to-Poisson is proved (Poisson's theorem), how the central limit theorem is proved, and how tνN(0,1)t_\nu\to\mathcal{N}(0,1) as ν\nu\to\infty is proved.

Trick 3: Transformation identifies via change of variables

If Y=g(X)Y = g(X) for a smooth invertible gg, then fY(y)=fX(g1(y))(g1)(y)f_Y(y) = f_X(g^{-1}(y))|(g^{-1})'(y)|. This is how Lognormal is derived from Normal, how the Pareto-log-Exponential connection is verified, how the Student-t density is computed from the joint (Z,V)(Z,V) density, and how the F density is computed from the joint (V1,V2)(V_1,V_2) density.

Two Common Confusions

Watch Out

Rate versus scale parameterization

Exponential and Gamma both have two conventions. Rate uses λ\lambda with density λeλx\lambda e^{-\lambda x} and mean 1/λ1/\lambda. Scale uses θ=1/λ\theta = 1/\lambda with density ex/θ/θe^{-x/\theta}/\theta and mean θ\theta. SciPy and many engineering texts default to scale; mathematical-statistics texts default to rate. Always check which parameterization a software library uses before plugging in. The pages in this atlas use rate by default and note where scale is more natural.

Watch Out

Chi-squared as a sampling distribution versus a concentration class

The Chi-squared distribution in this atlas is the exact sampling distribution of Zi2\sum Z_i^2. The phrase "Chi-squared concentration" elsewhere on the site refers to the Laurent-Massart sub-Gamma tail bounds for Chi-squared random variables, which is a finite-sample inequality, not a description of the law. See chi-squared concentration for the bound; see chi-squared distribution and tests for the law and its uses.

How to Use This Atlas

Pick a target distribution; read down the connection-graph table; follow the edges to the building blocks. Each distribution page derives its connections from this atlas and links back here.

Exercises

ExerciseCore

Problem

Let X1,,XnX_1,\dots,X_n be i.i.d. Exp(λ)\operatorname{Exp}(\lambda) with n=10n=10 and λ=2\lambda = 2. Identify the distribution of S=i=110XiS = \sum_{i=1}^{10} X_i and compute E[S]\mathbb{E}[S] and Var(S)\operatorname{Var}(S).

ExerciseCore

Problem

A Poisson process with rate λ=3\lambda = 3 events per minute is observed. Let YY be the time, in minutes, of the third event. Identify the distribution of YY and compute P(Y>2)\mathbb{P}(Y > 2) in terms of an incomplete Gamma function.

ExerciseAdvanced

Problem

Let ZN(0,1)Z\sim\mathcal{N}(0,1) and Vχk2V\sim\chi^2_k be independent, and let T=Z/V/kT = Z/\sqrt{V/k}. Show that as kk\to\infty, the distribution of TT converges to N(0,1)\mathcal{N}(0,1).

References

Canonical:

  • Casella and Berger, Statistical Inference (2002), Chapters 3 and 5.
  • Bickel and Doksum, Mathematical Statistics, Volume I (2015), Chapter 1.
  • Johnson, Kotz, and Balakrishnan, Continuous Univariate Distributions, Volumes 1 and 2 (1994 and 1995).
  • Johnson, Kemp, and Kotz, Univariate Discrete Distributions (2005).

Probability foundations:

  • Blitzstein and Hwang, Introduction to Probability (2019), Chapters 3 through 8.
  • Durrett, Probability: Theory and Examples (2019), Chapters 2 and 3.
  • Grimmett and Stirzaker, Probability and Random Processes (2020), Chapters 3 through 6.

Bayesian framing:

  • Gelman et al., Bayesian Data Analysis (2013), Chapter 2.
  • Robert, The Bayesian Choice (2007), Chapter 3.

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

3

Derived topics

8

+3 more on the derived-topics page.