Skip to main content

Foundations

Gamma Distribution

The Gamma distribution as the sum of independent Exponentials and as a flexible nonnegative density: shape and rate, density and MGF, conjugacy for Poisson and Exponential likelihoods, Chi-squared as a special case, MLE without closed form.

CoreTier 1StableCore spine~55 min

Why This Matters

The Gamma family is the parametric family of waiting times for the kk-th event in a Poisson process and, equivalently, the family of sums of independent Exponential random variables. It is the natural extension of the Exponential distribution when memorylessness is too restrictive and you want to model a hazard rate that changes monotonically with time. Two specific reasons to learn it now:

  1. The Chi-squared distribution is a Gamma with shape k/2k/2 and rate 1/21/2. Every result about Chi-squared sample variance, F statistic, and Pearson Chi-squared test starts from a Gamma identity.
  2. The Gamma is the conjugate prior for the Poisson rate and for the Exponential rate. Bayesian models for count and waiting-time data use a Gamma prior and produce a Gamma posterior.

The Gamma has two common parameterizations (shape and rate, or shape and scale). Both are correct; both are unavoidable in practice. This page uses shape α\alpha and rate β\beta.

Definition

Definition

Gamma Distribution

A random variable XX has a Gamma distribution with shape α>0\alpha > 0 and rate β>0\beta > 0 if its density is

fX(x)=βαΓ(α)xα1eβx,x>0,f_X(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}\,x^{\alpha-1}\,e^{-\beta x},\qquad x > 0,

where Γ(α)=0tα1etdt\Gamma(\alpha) = \int_0^\infty t^{\alpha-1}e^{-t}\,dt is the Gamma function.

The scale parameterization uses θ=1/β\theta = 1/\beta and writes the density as xα1ex/θ/[θαΓ(α)]x^{\alpha-1}e^{-x/\theta}/[\theta^\alpha\Gamma(\alpha)]. With shape and rate, E[X]=α/β\mathbb{E}[X] = \alpha/\beta and Var(X)=α/β2\operatorname{Var}(X) = \alpha/\beta^2.

The shape α\alpha controls the "polynomial multiplier" of the density. When α=1\alpha = 1 the multiplier x0=1x^0 = 1 is constant, and the Gamma reduces to the Exponential. When α<1\alpha < 1 the density is unbounded at zero; when α>1\alpha > 1 the density has a single mode at (α1)/β(\alpha - 1)/\beta. When α\alpha is a positive integer the Gamma is sometimes called the Erlang distribution.

MGF and Moments

Theorem

Gamma MGF

Statement

For XGamma(α,β)X\sim\operatorname{Gamma}(\alpha,\beta) and s<βs < \beta, MX(s)=E[esX]=(ββs)α.M_X(s) = \mathbb{E}[e^{sX}] = \left(\frac{\beta}{\beta-s}\right)^\alpha. For sβs\ge\beta the MGF is infinite.

Intuition

This is the MGF of the Exponential raised to the α\alpha-th power. When α\alpha is a positive integer, the α\alpha-th power is exactly the MGF of a sum of α\alpha independent Exponentials, which is the integer-shape case of the Gamma.

Proof Sketch

MX(s)=0esxβαΓ(α)xα1eβxdx=βαΓ(α)0xα1e(βs)xdx.M_X(s) = \int_0^\infty e^{sx}\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}\,dx = \frac{\beta^\alpha}{\Gamma(\alpha)}\int_0^\infty x^{\alpha-1}e^{-(\beta-s)x}\,dx. Substitute u=(βs)xu = (\beta-s)x for s<βs<\beta: MX(s)=βαΓ(α)(βs)α0uα1eudu=(ββs)α,M_X(s) = \frac{\beta^\alpha}{\Gamma(\alpha)(\beta-s)^\alpha}\int_0^\infty u^{\alpha-1}e^{-u}\,du = \left(\frac{\beta}{\beta-s}\right)^\alpha, since the remaining integral is Γ(α)\Gamma(\alpha).

Why It Matters

Differentiating gives E[X]=α/β\mathbb{E}[X] = \alpha/\beta and Var(X)=α/β2\operatorname{Var}(X) = \alpha/\beta^2. As α\alpha\to\infty with α/β=μ\alpha/\beta = \mu fixed (so β\beta\to\infty), the Gamma becomes increasingly concentrated around μ\mu and approaches a Normal distribution by the central limit theorem applied to the sum-of-Exponentials representation. The Gamma is sub-exponential but not sub-Gaussian; its tail decays at rate eβxe^{-\beta x} multiplied by a polynomial.

Failure Mode

The MGF is finite only on the half-line s<βs<\beta. The Gamma inherits the sub-exponential tail of the Exponential, not the sub-Gaussian tail of the Normal. Confusing the two leads to overconfident concentration bounds.

Additivity Under Independent Sums

Theorem

Gamma Additivity

Statement

If XGamma(α1,β)X\sim\operatorname{Gamma}(\alpha_1,\beta) and YGamma(α2,β)Y\sim\operatorname{Gamma}(\alpha_2,\beta) are independent, then X+YGamma(α1+α2,β).X + Y\sim\operatorname{Gamma}(\alpha_1+\alpha_2,\beta).

Intuition

Independent waits for α1\alpha_1 events and then α2\alpha_2 events in the same Poisson process add to a wait for α1+α2\alpha_1+\alpha_2 events. The rate must be common; the shape parameters add.

Proof Sketch

By independence, MX+Y(s)=MX(s)MY(s)=[β/(βs)]α1[β/(βs)]α2=[β/(βs)]α1+α2M_{X+Y}(s) = M_X(s)M_Y(s) = [\beta/(\beta-s)]^{\alpha_1}\cdot[\beta/(\beta-s)]^{\alpha_2} = [\beta/(\beta-s)]^{\alpha_1+\alpha_2}, the MGF of Gamma(α1+α2,β)\operatorname{Gamma}(\alpha_1+\alpha_2,\beta). MGF uniqueness identifies the law.

Why It Matters

This is the rule that turns any sum of independent Gammas with the same rate into another Gamma. The two most important applications are: a sum of kk i.i.d. Exp(λ)\operatorname{Exp}(\lambda) is Gamma(k,λ)\operatorname{Gamma}(k,\lambda); a sum of kk independent Chi-squareds with degrees d1,,dkd_1,\dots,d_k is Gamma((d1++dk)/2,1/2)=χd1++dk2\operatorname{Gamma}((d_1+\cdots+d_k)/2, 1/2) = \chi^2_{d_1+\cdots+d_k}.

Failure Mode

Additivity requires a common rate. The sum of Gammas with different rates is not a Gamma; it is a hypoexponential or generalized-Erlang distribution with a more complex density. The shape parameter must add only when the rate parameters match.

Chi-squared Is a Specific Gamma

Theorem

Chi-squared as a Gamma

Statement

χk2=Gamma(k/2,1/2)\chi^2_k = \operatorname{Gamma}(k/2, 1/2) as parametric families.

Intuition

The Chi-squared density is the Gamma density with the specific shape α=k/2\alpha = k/2 and rate β=1/2\beta = 1/2. The half-integer shape is the only thing distinguishing Chi-squared from generic Gammas.

Proof Sketch

The Chi-squared(kk) density is f(x)=12k/2Γ(k/2)xk/21ex/2,x>0.f(x) = \frac{1}{2^{k/2}\Gamma(k/2)}x^{k/2-1}e^{-x/2},\qquad x>0. Substituting α=k/2\alpha = k/2 and β=1/2\beta = 1/2 into the Gamma density gives the same expression. The two families coincide on every k{1,2,}k\in\{1,2,\dots\}.

Why It Matters

Every Chi-squared identity is a Gamma identity in disguise. The additivity of independent Chi-squareds with d1,d2d_1, d_2 degrees giving χd1+d22\chi^2_{d_1+d_2} is just Gamma additivity with common rate 1/21/2. The Poisson-to-Chi-squared bridge for goodness-of-fit testing is the same Gamma calculation. Computing Chi-squared quantiles in software typically calls a Gamma routine under the hood.

Failure Mode

The identification works only for the rate parameterization with β=1/2\beta = 1/2. With a scale parameterization, the same family is Gamma(shape=k/2,scale=2)\operatorname{Gamma}(\text{shape}=k/2, \text{scale}=2). Pulling the wrong scale gives a Chi-squared with the wrong degrees of freedom and breaks every downstream computation.

Conjugate Prior for the Poisson Rate

Theorem

Gamma-Poisson Conjugacy

Statement

Let X1,,XnPois(λ)X_1,\dots,X_n\sim\operatorname{Pois}(\lambda) be i.i.d., and let the prior be λGamma(α0,β0)\lambda\sim\operatorname{Gamma}(\alpha_0,\beta_0). Then the posterior is λX1,,XnGamma ⁣(α0+i=1nXi, β0+n).\lambda\mid X_1,\dots,X_n\sim\operatorname{Gamma}\!\left(\alpha_0 + \sum_{i=1}^n X_i,\ \beta_0 + n\right).

Intuition

A Gamma prior contributes α01\alpha_0 - 1 pseudo-events in pseudo-time β0\beta_0. Observing Xi\sum X_i events in real time nn adds to both. The posterior is Gamma with shape equal to total events plus pseudo-events plus one, and rate equal to total time plus pseudo-time.

Proof Sketch

The likelihood for nn i.i.d. Poisson observations is L(λ)=i=1nλXieλXi!λXienλ.L(\lambda) = \prod_{i=1}^n \frac{\lambda^{X_i}e^{-\lambda}}{X_i!}\propto \lambda^{\sum X_i}e^{-n\lambda}. The Gamma prior density is proportional to λα01eβ0λ\lambda^{\alpha_0-1}e^{-\beta_0\lambda}. Their product is λα0+Xi1e(β0+n)λ,\lambda^{\alpha_0+\sum X_i - 1}\,e^{-(\beta_0+n)\lambda}, which is the kernel of Gamma(α0+Xi,β0+n)\operatorname{Gamma}(\alpha_0+\sum X_i, \beta_0+n).

Why It Matters

The conjugate-prior update is the cleanest case in Bayesian inference: prior and posterior are in the same family, with parameters that have a transparent "events and time" interpretation. The same conjugacy applies to the Exponential likelihood (with the same posterior form), and to the rate of any other Poisson-process-derived count model. See bayesian estimation for the broader pattern.

Failure Mode

Conjugacy is fragile. The Gamma is the conjugate prior only for the Poisson rate or the Exponential rate. Reparameterizing to the inverse (mean Poisson count, mean Exponential waiting time) gives a different conjugate family (an Inverse Gamma). The conjugate prior is a property of a parameterization, not of the family.

Maximum Likelihood Estimation

The MLE for the Gamma has no closed form. Given an i.i.d. sample X1,,XnX_1,\dots,X_n, the log-likelihood is

(α,β)=nαlogβnlogΓ(α)+(α1)logXiβXi.\ell(\alpha,\beta) = n\alpha\log\beta - n\log\Gamma(\alpha) + (\alpha-1)\sum\log X_i - \beta\sum X_i.

The score equations are

α=nlogβnψ(α)+logXi=0,β=nαβXi=0,\frac{\partial\ell}{\partial\alpha} = n\log\beta - n\psi(\alpha) + \sum\log X_i = 0,\qquad \frac{\partial\ell}{\partial\beta} = \frac{n\alpha}{\beta} - \sum X_i = 0,

where ψ\psi is the digamma function. The second equation gives β^=α^/Xˉn\hat\beta = \hat\alpha/\bar X_n. Substituting into the first gives logα^ψ(α^)=logXˉnlogX,\log\hat\alpha - \psi(\hat\alpha) = \log\bar X_n - \overline{\log X}, where logX=(1/n)logXi\overline{\log X} = (1/n)\sum\log X_i. This must be solved numerically (Newton iteration starting from the method-of-moments estimator α^MoM=Xˉn2/(sample variance)\hat\alpha_{\text{MoM}} = \bar X_n^2/\text{(sample variance)} works well). For the special case α=1\alpha = 1 (Exponential), the score equation collapses and β^=1/Xˉn\hat\beta = 1/\bar X_n in closed form.

The Fisher information matrix at (α,β)(\alpha,\beta) per observation is

I(α,β)=(ψ(α)1/β1/βα/β2),I(\alpha,\beta) = \begin{pmatrix} \psi'(\alpha) & -1/\beta \\ -1/\beta & \alpha/\beta^2 \end{pmatrix},

where ψ=dψ/dα\psi' = \mathrm{d}\psi/\mathrm{d}\alpha is the trigamma function. The asymptotic variance of the MLE is the inverse of nn times this matrix; see maximum likelihood estimation for the general result.

Method of Moments (Closed Form)

The method-of-moments estimator has a closed form:

α^MoM=Xˉn2σ^n2,β^MoM=Xˉnσ^n2,\hat\alpha_{\text{MoM}} = \frac{\bar X_n^2}{\hat\sigma^2_n},\qquad \hat\beta_{\text{MoM}} = \frac{\bar X_n}{\hat\sigma^2_n},

where σ^n2=(1/n)(XiXˉn)2\hat\sigma^2_n = (1/n)\sum(X_i-\bar X_n)^2. The estimators come from matching the sample mean to α/β\alpha/\beta and the sample variance to α/β2\alpha/\beta^2 and solving. MoM is consistent but inefficient: its asymptotic variance is larger than the inverse Fisher information except at α=1\alpha = 1, where the two coincide. See method of moments for the general framework.

When Each Parameterization Is Convenient

SettingUse shape and rateUse shape and scale
Bayesian inference for Poisson rateYes (additive update on rate)No
Poisson process waiting timeYes (rate matches process rate λ\lambda)No
Chi-squared identificationYes (rate β=1/2\beta = 1/2)Awkward (scale θ=2\theta = 2)
SciPy gamma.rvs(a=..., scale=...)No (SciPy uses scale)Yes
Survival-analysis hazard interpretationMixedYes (scale matches characteristic lifetime)

The shape-and-rate convention is the math convention; the shape-and-scale convention is the engineering convention. Both appear in Casella-Berger depending on the chapter.

Common Confusions

Watch Out

The shape parameter is not the number of events

For integer shape kk, Gamma(k,λ)\operatorname{Gamma}(k,\lambda) is the time of the kk-th event in a rate-λ\lambda Poisson process. For non-integer shape, the Gamma is still a valid distribution but has no "number of events" interpretation; the shape is a continuous-extended index, not a count.

Watch Out

The Gamma distribution is not the Gamma function

The Gamma function Γ(α)\Gamma(\alpha) is a deterministic special function used as a normalizing constant in the density. The Gamma distribution is a probability distribution. They share a name because Γ(α)\Gamma(\alpha) appears in the density, not because the function is itself a random variable.

Watch Out

Sum of Gammas with different rates is not a Gamma

Additivity requires the rate parameters to be equal. A sum of independent Gamma(α1,β1)\operatorname{Gamma}(\alpha_1,\beta_1) and Gamma(α2,β2)\operatorname{Gamma}(\alpha_2,\beta_2) with β1β2\beta_1\ne\beta_2 is a hypoexponential or generalized Erlang distribution, not a Gamma. The MGF of the sum is the product, but the product does not have Gamma form unless the rates match.

Exercises

ExerciseCore

Problem

Let XGamma(3,2)X\sim\operatorname{Gamma}(3,2). Compute E[X]\mathbb{E}[X], Var(X)\operatorname{Var}(X), and the mode.

ExerciseCore

Problem

Let X1,X2,X3X_1,X_2,X_3 be independent Exp(λ)\operatorname{Exp}(\lambda) with λ=1\lambda = 1. Identify the distribution of S=X1+X2+X3S = X_1 + X_2 + X_3 and compute P(S>4)\mathbb{P}(S > 4) in terms of an incomplete Gamma function.

ExerciseAdvanced

Problem

A telescope counts photons from a faint source over five non-overlapping one-second intervals. Prior to the experiment, you believe the source rate λ\lambda is approximately one photon per second, so you assign the prior λGamma(2,2)\lambda\sim\operatorname{Gamma}(2,2). You observe counts (3,1,4,2,0)(3, 1, 4, 2, 0). Compute the posterior distribution of λ\lambda and the posterior mean.

ExerciseAdvanced

Problem

Let V1χ52V_1\sim\chi^2_5 and V2χ82V_2\sim\chi^2_8 be independent. Identify the distribution of V1+V2V_1 + V_2 and explain via the Gamma additivity result.

References

Canonical:

  • Casella and Berger, Statistical Inference (2002), Chapter 3 (Section 3.3 on Gamma and related distributions), Chapter 7 (MLE for the Gamma).
  • Lehmann and Casella, Theory of Point Estimation (1998), Chapter 1 (sufficiency for the Gamma family).
  • Bickel and Doksum, Mathematical Statistics, Volume I (2015), Chapter 1 (Section 1.6 on conjugate families).

Bayesian framing:

  • Gelman et al., Bayesian Data Analysis (2013), Chapter 2 (Section 2.6 on conjugate priors and the Gamma-Poisson update).
  • Robert, The Bayesian Choice (2007), Chapter 3.

Special functions and computation:

  • Abramowitz and Stegun, Handbook of Mathematical Functions (1972), Chapter 6 (Gamma and digamma functions).
  • Press, Teukolsky, Vetterling, and Flannery, Numerical Recipes (2007), Chapter 6 (incomplete Gamma function evaluation).

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

4

Derived topics

2