Foundations
Gamma Distribution
The Gamma distribution as the sum of independent Exponentials and as a flexible nonnegative density: shape and rate, density and MGF, conjugacy for Poisson and Exponential likelihoods, Chi-squared as a special case, MLE without closed form.
Prerequisites
Why This Matters
The Gamma family is the parametric family of waiting times for the -th event in a Poisson process and, equivalently, the family of sums of independent Exponential random variables. It is the natural extension of the Exponential distribution when memorylessness is too restrictive and you want to model a hazard rate that changes monotonically with time. Two specific reasons to learn it now:
- The Chi-squared distribution is a Gamma with shape and rate . Every result about Chi-squared sample variance, F statistic, and Pearson Chi-squared test starts from a Gamma identity.
- The Gamma is the conjugate prior for the Poisson rate and for the Exponential rate. Bayesian models for count and waiting-time data use a Gamma prior and produce a Gamma posterior.
The Gamma has two common parameterizations (shape and rate, or shape and scale). Both are correct; both are unavoidable in practice. This page uses shape and rate .
Definition
Gamma Distribution
A random variable has a Gamma distribution with shape and rate if its density is
where is the Gamma function.
The scale parameterization uses and writes the density as . With shape and rate, and .
The shape controls the "polynomial multiplier" of the density. When the multiplier is constant, and the Gamma reduces to the Exponential. When the density is unbounded at zero; when the density has a single mode at . When is a positive integer the Gamma is sometimes called the Erlang distribution.
MGF and Moments
Gamma MGF
Statement
For and , For the MGF is infinite.
Intuition
This is the MGF of the Exponential raised to the -th power. When is a positive integer, the -th power is exactly the MGF of a sum of independent Exponentials, which is the integer-shape case of the Gamma.
Proof Sketch
Substitute for : since the remaining integral is .
Why It Matters
Differentiating gives and . As with fixed (so ), the Gamma becomes increasingly concentrated around and approaches a Normal distribution by the central limit theorem applied to the sum-of-Exponentials representation. The Gamma is sub-exponential but not sub-Gaussian; its tail decays at rate multiplied by a polynomial.
Failure Mode
The MGF is finite only on the half-line . The Gamma inherits the sub-exponential tail of the Exponential, not the sub-Gaussian tail of the Normal. Confusing the two leads to overconfident concentration bounds.
Additivity Under Independent Sums
Gamma Additivity
Statement
If and are independent, then
Intuition
Independent waits for events and then events in the same Poisson process add to a wait for events. The rate must be common; the shape parameters add.
Proof Sketch
By independence, , the MGF of . MGF uniqueness identifies the law.
Why It Matters
This is the rule that turns any sum of independent Gammas with the same rate into another Gamma. The two most important applications are: a sum of i.i.d. is ; a sum of independent Chi-squareds with degrees is .
Failure Mode
Additivity requires a common rate. The sum of Gammas with different rates is not a Gamma; it is a hypoexponential or generalized-Erlang distribution with a more complex density. The shape parameter must add only when the rate parameters match.
Chi-squared Is a Specific Gamma
Chi-squared as a Gamma
Statement
as parametric families.
Intuition
The Chi-squared density is the Gamma density with the specific shape and rate . The half-integer shape is the only thing distinguishing Chi-squared from generic Gammas.
Proof Sketch
The Chi-squared() density is Substituting and into the Gamma density gives the same expression. The two families coincide on every .
Why It Matters
Every Chi-squared identity is a Gamma identity in disguise. The additivity of independent Chi-squareds with degrees giving is just Gamma additivity with common rate . The Poisson-to-Chi-squared bridge for goodness-of-fit testing is the same Gamma calculation. Computing Chi-squared quantiles in software typically calls a Gamma routine under the hood.
Failure Mode
The identification works only for the rate parameterization with . With a scale parameterization, the same family is . Pulling the wrong scale gives a Chi-squared with the wrong degrees of freedom and breaks every downstream computation.
Conjugate Prior for the Poisson Rate
Gamma-Poisson Conjugacy
Statement
Let be i.i.d., and let the prior be . Then the posterior is
Intuition
A Gamma prior contributes pseudo-events in pseudo-time . Observing events in real time adds to both. The posterior is Gamma with shape equal to total events plus pseudo-events plus one, and rate equal to total time plus pseudo-time.
Proof Sketch
The likelihood for i.i.d. Poisson observations is The Gamma prior density is proportional to . Their product is which is the kernel of .
Why It Matters
The conjugate-prior update is the cleanest case in Bayesian inference: prior and posterior are in the same family, with parameters that have a transparent "events and time" interpretation. The same conjugacy applies to the Exponential likelihood (with the same posterior form), and to the rate of any other Poisson-process-derived count model. See bayesian estimation for the broader pattern.
Failure Mode
Conjugacy is fragile. The Gamma is the conjugate prior only for the Poisson rate or the Exponential rate. Reparameterizing to the inverse (mean Poisson count, mean Exponential waiting time) gives a different conjugate family (an Inverse Gamma). The conjugate prior is a property of a parameterization, not of the family.
Maximum Likelihood Estimation
The MLE for the Gamma has no closed form. Given an i.i.d. sample , the log-likelihood is
The score equations are
where is the digamma function. The second equation gives . Substituting into the first gives where . This must be solved numerically (Newton iteration starting from the method-of-moments estimator works well). For the special case (Exponential), the score equation collapses and in closed form.
The Fisher information matrix at per observation is
where is the trigamma function. The asymptotic variance of the MLE is the inverse of times this matrix; see maximum likelihood estimation for the general result.
Method of Moments (Closed Form)
The method-of-moments estimator has a closed form:
where . The estimators come from matching the sample mean to and the sample variance to and solving. MoM is consistent but inefficient: its asymptotic variance is larger than the inverse Fisher information except at , where the two coincide. See method of moments for the general framework.
When Each Parameterization Is Convenient
| Setting | Use shape and rate | Use shape and scale |
|---|---|---|
| Bayesian inference for Poisson rate | Yes (additive update on rate) | No |
| Poisson process waiting time | Yes (rate matches process rate ) | No |
| Chi-squared identification | Yes (rate ) | Awkward (scale ) |
SciPy gamma.rvs(a=..., scale=...) | No (SciPy uses scale) | Yes |
| Survival-analysis hazard interpretation | Mixed | Yes (scale matches characteristic lifetime) |
The shape-and-rate convention is the math convention; the shape-and-scale convention is the engineering convention. Both appear in Casella-Berger depending on the chapter.
Common Confusions
The shape parameter is not the number of events
For integer shape , is the time of the -th event in a rate- Poisson process. For non-integer shape, the Gamma is still a valid distribution but has no "number of events" interpretation; the shape is a continuous-extended index, not a count.
The Gamma distribution is not the Gamma function
The Gamma function is a deterministic special function used as a normalizing constant in the density. The Gamma distribution is a probability distribution. They share a name because appears in the density, not because the function is itself a random variable.
Sum of Gammas with different rates is not a Gamma
Additivity requires the rate parameters to be equal. A sum of independent and with is a hypoexponential or generalized Erlang distribution, not a Gamma. The MGF of the sum is the product, but the product does not have Gamma form unless the rates match.
Exercises
Problem
Let . Compute , , and the mode.
Problem
Let be independent with . Identify the distribution of and compute in terms of an incomplete Gamma function.
Problem
A telescope counts photons from a faint source over five non-overlapping one-second intervals. Prior to the experiment, you believe the source rate is approximately one photon per second, so you assign the prior . You observe counts . Compute the posterior distribution of and the posterior mean.
Problem
Let and be independent. Identify the distribution of and explain via the Gamma additivity result.
References
Canonical:
- Casella and Berger, Statistical Inference (2002), Chapter 3 (Section 3.3 on Gamma and related distributions), Chapter 7 (MLE for the Gamma).
- Lehmann and Casella, Theory of Point Estimation (1998), Chapter 1 (sufficiency for the Gamma family).
- Bickel and Doksum, Mathematical Statistics, Volume I (2015), Chapter 1 (Section 1.6 on conjugate families).
Bayesian framing:
- Gelman et al., Bayesian Data Analysis (2013), Chapter 2 (Section 2.6 on conjugate priors and the Gamma-Poisson update).
- Robert, The Bayesian Choice (2007), Chapter 3.
Special functions and computation:
- Abramowitz and Stegun, Handbook of Mathematical Functions (1972), Chapter 6 (Gamma and digamma functions).
- Press, Teukolsky, Vetterling, and Flannery, Numerical Recipes (2007), Chapter 6 (incomplete Gamma function evaluation).
Last reviewed: May 11, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
4- Common Probability Distributionslayer 0A · tier 1
- Distributions Atlaslayer 0A · tier 1
- Exponential Distributionlayer 0A · tier 1
- Exponential Function Propertieslayer 0A · tier 1
Derived topics
2- Beta Distributionlayer 0A · tier 1
- Chi-Squared Distribution and Testslayer 1 · tier 1
Graph-backed continuations