Gamma Distribution

Sneiderman, Robby

Foundations

Gamma Distribution

The Gamma distribution as the sum of independent Exponentials and as a flexible nonnegative density: shape and rate, density and MGF, conjugacy for Poisson and Exponential likelihoods, Chi-squared as a special case, MLE without closed form.

CoreTier 1StableCore spine~55 min

Prerequisites

Common Probability Distributions Distributions Atlas Exponential Distribution Exponential Function Properties

Prereq Map

Why This Matters

The Gamma family is the parametric family of waiting times for the $k$ -th event in a Poisson process and, equivalently, the family of sums of independent Exponential random variables. It is the natural extension of the Exponential distribution when memorylessness is too restrictive and you want to model a hazard rate that changes monotonically with time. Two specific reasons to learn it now:

The Chi-squared distribution is a Gamma with shape $k/2$ and rate $1/2$ . Every result about Chi-squared sample variance, F statistic, and Pearson Chi-squared test starts from a Gamma identity.
The Gamma is the conjugate prior for the Poisson rate and for the Exponential rate. Bayesian models for count and waiting-time data use a Gamma prior and produce a Gamma posterior.

The Gamma has two common parameterizations (shape and rate, or shape and scale). Both are correct; both are unavoidable in practice. This page uses shape $\alpha$ and rate $\beta$ .

Definition

Gamma Distribution $X \sim Gamma (α, β)$

A random variable $X$ has a Gamma distribution with shape $\alpha > 0$ and rate $\beta > 0$ if its density is

$f_X(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}\,x^{\alpha-1}\,e^{-\beta x},\qquad x > 0,$

where $\Gamma(\alpha) = \int_0^\infty t^{\alpha-1}e^{-t}\,dt$ is the Gamma function.

The scale parameterization uses $\theta = 1/\beta$ and writes the density as $x^{\alpha-1}e^{-x/\theta}/[\theta^\alpha\Gamma(\alpha)]$ . With shape and rate, $\mathbb{E}[X] = \alpha/\beta$ and $\operatorname{Var}(X) = \alpha/\beta^2$ .

The shape $\alpha$ controls the "polynomial multiplier" of the density. When $\alpha = 1$ the multiplier $x^0 = 1$ is constant, and the Gamma reduces to the Exponential. When $\alpha < 1$ the density is unbounded at zero; when $\alpha > 1$ the density has a single mode at $(\alpha - 1)/\beta$ . When $\alpha$ is a positive integer the Gamma is sometimes called the Erlang distribution.

MGF and Moments

Theorem

Gamma MGF

Statement

For $X\sim\operatorname{Gamma}(\alpha,\beta)$ and $s < \beta$ , $M_X(s) = \mathbb{E}[e^{sX}] = \left(\frac{\beta}{\beta-s}\right)^\alpha.$ For $s\ge\beta$ the MGF is infinite.

Intuition

This is the MGF of the Exponential raised to the $\alpha$ -th power. When $\alpha$ is a positive integer, the $\alpha$ -th power is exactly the MGF of a sum of $\alpha$ independent Exponentials, which is the integer-shape case of the Gamma.

Proof Sketch

$M_X(s) = \int_0^\infty e^{sx}\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}\,dx = \frac{\beta^\alpha}{\Gamma(\alpha)}\int_0^\infty x^{\alpha-1}e^{-(\beta-s)x}\,dx.$ Substitute $u = (\beta-s)x$ for $s<\beta$ : $M_X(s) = \frac{\beta^\alpha}{\Gamma(\alpha)(\beta-s)^\alpha}\int_0^\infty u^{\alpha-1}e^{-u}\,du = \left(\frac{\beta}{\beta-s}\right)^\alpha,$ since the remaining integral is $\Gamma(\alpha)$ .

Why It Matters

Differentiating gives $\mathbb{E}[X] = \alpha/\beta$ and $\operatorname{Var}(X) = \alpha/\beta^2$ . As $\alpha\to\infty$ with $\alpha/\beta = \mu$ fixed (so $\beta\to\infty$ ), the Gamma becomes increasingly concentrated around $\mu$ and approaches a Normal distribution by the central limit theorem applied to the sum-of-Exponentials representation. The Gamma is sub-exponential but not sub-Gaussian; its tail decays at rate $e^{-\beta x}$ multiplied by a polynomial.

Failure Mode

The MGF is finite only on the half-line $s<\beta$ . The Gamma inherits the sub-exponential tail of the Exponential, not the sub-Gaussian tail of the Normal. Confusing the two leads to overconfident concentration bounds.

report a correction →

Additivity Under Independent Sums

Theorem

Gamma Additivity

Statement

If $X\sim\operatorname{Gamma}(\alpha_1,\beta)$ and $Y\sim\operatorname{Gamma}(\alpha_2,\beta)$ are independent, then $X + Y\sim\operatorname{Gamma}(\alpha_1+\alpha_2,\beta).$

Intuition

Independent waits for $\alpha_1$ events and then $\alpha_2$ events in the same Poisson process add to a wait for $\alpha_1+\alpha_2$ events. The rate must be common; the shape parameters add.

Proof Sketch

By independence, $M_{X+Y}(s) = M_X(s)M_Y(s) = [\beta/(\beta-s)]^{\alpha_1}\cdot[\beta/(\beta-s)]^{\alpha_2} = [\beta/(\beta-s)]^{\alpha_1+\alpha_2}$ , the MGF of $\operatorname{Gamma}(\alpha_1+\alpha_2,\beta)$ . MGF uniqueness identifies the law.

Why It Matters

This is the rule that turns any sum of independent Gammas with the same rate into another Gamma. The two most important applications are: a sum of $k$ i.i.d. $\operatorname{Exp}(\lambda)$ is $\operatorname{Gamma}(k,\lambda)$ ; a sum of $k$ independent Chi-squareds with degrees $d_1,\dots,d_k$ is $\operatorname{Gamma}((d_1+\cdots+d_k)/2, 1/2) = \chi^2_{d_1+\cdots+d_k}$ .

Failure Mode

Additivity requires a common rate. The sum of Gammas with different rates is not a Gamma; it is a hypoexponential or generalized-Erlang distribution with a more complex density. The shape parameter must add only when the rate parameters match.

report a correction →

Chi-squared Is a Specific Gamma

Theorem

Chi-squared as a Gamma

Statement

$\chi^2_k = \operatorname{Gamma}(k/2, 1/2)$ as parametric families.

Intuition

The Chi-squared density is the Gamma density with the specific shape $\alpha = k/2$ and rate $\beta = 1/2$ . The half-integer shape is the only thing distinguishing Chi-squared from generic Gammas.

Proof Sketch

The Chi-squared( $k$ ) density is $f(x) = \frac{1}{2^{k/2}\Gamma(k/2)}x^{k/2-1}e^{-x/2},\qquad x>0.$ Substituting $\alpha = k/2$ and $\beta = 1/2$ into the Gamma density gives the same expression. The two families coincide on every $k\in\{1,2,\dots\}$ .

Why It Matters

Every Chi-squared identity is a Gamma identity in disguise. The additivity of independent Chi-squareds with $d_1, d_2$ degrees giving $\chi^2_{d_1+d_2}$ is just Gamma additivity with common rate $1/2$ . The Poisson-to-Chi-squared bridge for goodness-of-fit testing is the same Gamma calculation. Computing Chi-squared quantiles in software typically calls a Gamma routine under the hood.

Failure Mode

The identification works only for the rate parameterization with $\beta = 1/2$ . With a scale parameterization, the same family is $\operatorname{Gamma}(\text{shape}=k/2, \text{scale}=2)$ . Pulling the wrong scale gives a Chi-squared with the wrong degrees of freedom and breaks every downstream computation.

report a correction →

Conjugate Prior for the Poisson Rate

Theorem

Gamma-Poisson Conjugacy

Statement

Let $X_1,\dots,X_n\sim\operatorname{Pois}(\lambda)$ be i.i.d., and let the prior be $\lambda\sim\operatorname{Gamma}(\alpha_0,\beta_0)$ . Then the posterior is $\lambda\mid X_1,\dots,X_n\sim\operatorname{Gamma}\!\left(\alpha_0 + \sum_{i=1}^n X_i,\ \beta_0 + n\right).$

Intuition

A Gamma prior contributes $\alpha_0 - 1$ pseudo-events in pseudo-time $\beta_0$ . Observing $\sum X_i$ events in real time $n$ adds to both. The posterior is Gamma with shape equal to total events plus pseudo-events plus one, and rate equal to total time plus pseudo-time.

Proof Sketch

The likelihood for $n$ i.i.d. Poisson observations is $L(\lambda) = \prod_{i=1}^n \frac{\lambda^{X_i}e^{-\lambda}}{X_i!}\propto \lambda^{\sum X_i}e^{-n\lambda}.$ The Gamma prior density is proportional to $\lambda^{\alpha_0-1}e^{-\beta_0\lambda}$ . Their product is $\lambda^{\alpha_0+\sum X_i - 1}\,e^{-(\beta_0+n)\lambda},$ which is the kernel of $\operatorname{Gamma}(\alpha_0+\sum X_i, \beta_0+n)$ .

Why It Matters

The conjugate-prior update is the cleanest case in Bayesian inference: prior and posterior are in the same family, with parameters that have a transparent "events and time" interpretation. The same conjugacy applies to the Exponential likelihood (with the same posterior form), and to the rate of any other Poisson-process-derived count model. See bayesian estimation for the broader pattern.

Failure Mode

Conjugacy is fragile. The Gamma is the conjugate prior only for the Poisson rate or the Exponential rate. Reparameterizing to the inverse (mean Poisson count, mean Exponential waiting time) gives a different conjugate family (an Inverse Gamma). The conjugate prior is a property of a parameterization, not of the family.

report a correction →

Maximum Likelihood Estimation

The MLE for the Gamma has no closed form. Given an i.i.d. sample $X_1,\dots,X_n$ , the log-likelihood is

$\ell(\alpha,\beta) = n\alpha\log\beta - n\log\Gamma(\alpha) + (\alpha-1)\sum\log X_i - \beta\sum X_i.$

The score equations are

$\frac{\partial\ell}{\partial\alpha} = n\log\beta - n\psi(\alpha) + \sum\log X_i = 0,\qquad \frac{\partial\ell}{\partial\beta} = \frac{n\alpha}{\beta} - \sum X_i = 0,$

where $\psi$ is the digamma function. The second equation gives $\hat\beta = \hat\alpha/\bar X_n$ . Substituting into the first gives $\log\hat\alpha - \psi(\hat\alpha) = \log\bar X_n - \overline{\log X},$ where $\overline{\log X} = (1/n)\sum\log X_i$ . This must be solved numerically (Newton iteration starting from the method-of-moments estimator $\hat\alpha_{\text{MoM}} = \bar X_n^2/\text{(sample variance)}$ works well). For the special case $\alpha = 1$ (Exponential), the score equation collapses and $\hat\beta = 1/\bar X_n$ in closed form.

The Fisher information matrix at $(\alpha,\beta)$ per observation is

$I(\alpha,\beta) = \begin{pmatrix} \psi'(\alpha) & -1/\beta \\ -1/\beta & \alpha/\beta^2 \end{pmatrix},$

where $\psi' = \mathrm{d}\psi/\mathrm{d}\alpha$ is the trigamma function. The asymptotic variance of the MLE is the inverse of $n$ times this matrix; see maximum likelihood estimation for the general result.

Method of Moments (Closed Form)

The method-of-moments estimator has a closed form:

$\hat\alpha_{\text{MoM}} = \frac{\bar X_n^2}{\hat\sigma^2_n},\qquad \hat\beta_{\text{MoM}} = \frac{\bar X_n}{\hat\sigma^2_n},$

where $\hat\sigma^2_n = (1/n)\sum(X_i-\bar X_n)^2$ . The estimators come from matching the sample mean to $\alpha/\beta$ and the sample variance to $\alpha/\beta^2$ and solving. MoM is consistent but inefficient: its asymptotic variance is larger than the inverse Fisher information except at $\alpha = 1$ , where the two coincide. See method of moments for the general framework.

When Each Parameterization Is Convenient

Setting	Use shape and rate	Use shape and scale
Bayesian inference for Poisson rate	Yes (additive update on rate)	No
Poisson process waiting time	Yes (rate matches process rate $\lambda$ )	No
Chi-squared identification	Yes (rate $\beta = 1/2$ )	Awkward (scale $\theta = 2$ )
SciPy `gamma.rvs(a=..., scale=...)`	No (SciPy uses scale)	Yes
Survival-analysis hazard interpretation	Mixed	Yes (scale matches characteristic lifetime)

The shape-and-rate convention is the math convention; the shape-and-scale convention is the engineering convention. Both appear in Casella-Berger depending on the chapter.

Common Confusions

Watch Out

The shape parameter is not the number of events

For integer shape $k$ , $\operatorname{Gamma}(k,\lambda)$ is the time of the $k$ -th event in a rate- $\lambda$ Poisson process. For non-integer shape, the Gamma is still a valid distribution but has no "number of events" interpretation; the shape is a continuous-extended index, not a count.

Watch Out

The Gamma distribution is not the Gamma function

The Gamma function $\Gamma(\alpha)$ is a deterministic special function used as a normalizing constant in the density. The Gamma distribution is a probability distribution. They share a name because $\Gamma(\alpha)$ appears in the density, not because the function is itself a random variable.

Watch Out

Sum of Gammas with different rates is not a Gamma

Additivity requires the rate parameters to be equal. A sum of independent $\operatorname{Gamma}(\alpha_1,\beta_1)$ and $\operatorname{Gamma}(\alpha_2,\beta_2)$ with $\beta_1\ne\beta_2$ is a hypoexponential or generalized Erlang distribution, not a Gamma. The MGF of the sum is the product, but the product does not have Gamma form unless the rates match.

Exercises

ExerciseCore

Problem

Let $X\sim\operatorname{Gamma}(3,2)$ . Compute $\mathbb{E}[X]$ , $\operatorname{Var}(X)$ , and the mode.

ExerciseCore

Problem

Let $X_1,X_2,X_3$ be independent $\operatorname{Exp}(\lambda)$ with $\lambda = 1$ . Identify the distribution of $S = X_1 + X_2 + X_3$ and compute $\mathbb{P}(S > 4)$ in terms of an incomplete Gamma function.

ExerciseAdvanced

Problem

A telescope counts photons from a faint source over five non-overlapping one-second intervals. Prior to the experiment, you believe the source rate $\lambda$ is approximately one photon per second, so you assign the prior $\lambda\sim\operatorname{Gamma}(2,2)$ . You observe counts $(3, 1, 4, 2, 0)$ . Compute the posterior distribution of $\lambda$ and the posterior mean.

ExerciseAdvanced

Problem

Let $V_1\sim\chi^2_5$ and $V_2\sim\chi^2_8$ be independent. Identify the distribution of $V_1 + V_2$ and explain via the Gamma additivity result.

References

Canonical:

Casella and Berger, Statistical Inference (2002), Chapter 3 (Section 3.3 on Gamma and related distributions), Chapter 7 (MLE for the Gamma).
Lehmann and Casella, Theory of Point Estimation (1998), Chapter 1 (sufficiency for the Gamma family).
Bickel and Doksum, Mathematical Statistics, Volume I (2015), Chapter 1 (Section 1.6 on conjugate families).

Bayesian framing:

Gelman et al., Bayesian Data Analysis (2013), Chapter 2 (Section 2.6 on conjugate priors and the Gamma-Poisson update).
Robert, The Bayesian Choice (2007), Chapter 3.

Special functions and computation:

Abramowitz and Stegun, Handbook of Mathematical Functions (1972), Chapter 6 (Gamma and digamma functions).
Press, Teukolsky, Vetterling, and Flannery, Numerical Recipes (2007), Chapter 6 (incomplete Gamma function evaluation).

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Common Probability Distributionslayer 0A · tier 1
Distributions Atlaslayer 0A · tier 1
Exponential Distributionlayer 0A · tier 1
Exponential Function Propertieslayer 0A · tier 1

Derived topics

2

Beta Distributionlayer 0A · tier 1
Chi-Squared Distribution and Testslayer 1 · tier 1

Graph-backed continuations

Chi-Squared Distribution and Tests Beta Distribution