Skip to main content

Statistics

Tweedie Distribution

The Tweedie distribution is the one-parameter subfamily of the exponential dispersion model (EDM) family characterized by a power variance function V(mu) = mu^p. Special cases recover the Normal (p=0), Poisson (p=1), Gamma (p=2), and Inverse Gaussian (p=3). The intermediate range 1<p<2 produces a compound Poisson-Gamma distribution with a point mass at zero and a continuous positive part, which is the canonical model for insurance loss severity. The page covers the EDM construction, the four special-case identifications, and the compound-Poisson-Gamma representation; the applied actuarial treatment lives on ActuaryPath.

AdvancedAdvancedTier 2StableSupporting~35 min
For:StatsActuarial

Why This Matters

The Tweedie family is the answer to one question: which exponential dispersion model distributions have variance proportional to a power of the mean? The answer, due to Tweedie (1984), is a one-parameter family indexed by the power pp. Setting pp to specific integers recovers four standard distributions: Normal (p=0p = 0), Poisson (p=1p = 1), Gamma (p=2p = 2), Inverse Gaussian (p=3p = 3). The intermediate regime 1<p<21 < p < 2 is the surprising one. It produces a distribution with a point mass at zero plus a continuous Gamma-shaped positive part, and it is the standard model for insurance claim severity, rainfall totals, and other quantities that are often exactly zero and otherwise positive and skewed.

The Tweedie family also gives the variance function for generalized linear models with power-law variance, which is why it appears in modern actuarial software, environmental statistics, and high-frequency-zero count modeling.

Background: Exponential Dispersion Models

An exponential dispersion model (EDM) has density (or PMF) of the form f(y;θ,ϕ)=a(y,ϕ)exp ⁣(yθb(θ)ϕ),f(y; \theta, \phi) = a(y, \phi) \exp\!\left( \frac{y \theta - b(\theta)}{\phi} \right), where θ\theta is the natural parameter, ϕ>0\phi > 0 is the dispersion parameter, bb is the cumulant function, and aa is a normalization. The mean and variance are μ=b(θ),Var(Y)=ϕb(θ)=ϕV(μ),\mu = b'(\theta), \quad \text{Var}(Y) = \phi b''(\theta) = \phi V(\mu), where the variance function V(μ)=b((b)1(μ))V(\mu) = b''((b')^{-1}(\mu)) encodes the mean-variance relationship of the family. The EDM family includes Normal, Poisson, Gamma, Binomial, and Inverse Gaussian; each is determined by its variance function.

The Tweedie Variance Function

Theorem

Tweedie Family as Power-Variance EDMs

Statement

For each p(,0][1,)p \in (-\infty, 0] \cup [1, \infty), there is an EDM with variance function V(μ)=μpV(\mu) = \mu^p. The corresponding distribution is called the Tweedie distribution with power pp, written YTweedie(μ,ϕ,p)Y \sim \text{Tweedie}(\mu, \phi, p). The mean is μ\mu and the variance is ϕμp\phi \mu^p. Special cases identify standard families:

ppTweedie distributionVariance function
00Normal N(μ,ϕ)N(\mu, \phi)V(μ)=1V(\mu) = 1
11Poisson (with ϕ=1\phi = 1); over-dispersed Poisson otherwiseV(μ)=μV(\mu) = \mu
(1,2)(1, 2)Compound Poisson-Gamma (point mass at zero plus positive continuous part)V(μ)=μpV(\mu) = \mu^p
22GammaV(μ)=μ2V(\mu) = \mu^2
33Inverse GaussianV(μ)=μ3V(\mu) = \mu^3
p>2p > 2, p3p \neq 3Positive stable distributions (heavy-tailed continuous on (0,)(0, \infty))V(μ)=μpV(\mu) = \mu^p

The gap p(0,1)p \in (0, 1) is excluded: no Tweedie distribution exists for those powers because the corresponding cumulant function would not be convex on a non-degenerate parameter space.

Intuition

The variance function V(μ)=μpV(\mu) = \mu^p encodes how the spread of the data grows with the mean. Constant variance (p=0p = 0) is Normal. Variance equal to the mean (p=1p = 1) is Poisson. Variance proportional to the squared mean (p=2p = 2) is Gamma. The whole continuum is achievable, with the surprising compound case appearing in (1,2)(1, 2).

Proof Sketch

The cumulant function for Tweedie with power p{0,1}p \notin \{0, 1\} is, up to constants, b(θ)=12p ⁣[(1p)θ](2p)/(1p),b(\theta) = \frac{1}{2 - p}\!\left[ (1 - p) \theta \right]^{(2 - p)/(1 - p)}, defined on the appropriate domain of θ\theta so bb'' is positive. Computing μ=b(θ)\mu = b'(\theta) and inverting gives θ=μ1p/(1p)\theta = \mu^{1 - p}/(1 - p), then V(μ)=b((b)1(μ))=μpV(\mu) = b''((b')^{-1}(\mu)) = \mu^p by direct calculation. The boundary cases p=0,1,2p = 0, 1, 2 are continuous limits requiring separate treatment and recover the Normal, Poisson, and Gamma cumulant functions respectively. See Jorgensen (1987, 1997) for the existence proof of the EDM for each admissible pp.

Why It Matters

The Tweedie family is closed under several operations relevant in GLM fitting: scaling, summing iid copies, and translating the mean via a link function. The MLE for a Tweedie GLM can be computed by IRLS with the variance function μp\mu^p plugged into the standard machinery. Modern actuarial-pricing software (R tweedie, statmod, Python glm packages) implements exactly this.

Failure Mode

The dispersion ϕ\phi and power pp are both estimated from data in applied GLMs; identifying them simultaneously requires either a profile likelihood over pp or a saddlepoint approximation. With small samples the joint MLE can be unstable, particularly near the boundaries p=1p = 1 (Poisson) and p=2p = 2 (Gamma). Practical workflow: fix pp on a grid, fit the GLM for each, compare via profile log-likelihood.

The Compound Poisson-Gamma Regime: 1<p<21 < p < 2

Theorem

Compound Poisson-Gamma Representation in the Range 1 to 2

Statement

For p(1,2)p \in (1, 2), Tweedie(μ,ϕ,p)\text{Tweedie}(\mu, \phi, p) is the distribution of Y=i=1NZi,Y = \sum_{i=1}^N Z_i, where NPoisson(λ)N \sim \text{Poisson}(\lambda) and ZiZ_i are iid Gamma(α,β)\text{Gamma}(\alpha, \beta) independent of NN, with parameters λ=μ2pϕ(2p),α=2pp1,β=1ϕ(p1)μp1.\lambda = \frac{\mu^{2 - p}}{\phi (2 - p)}, \quad \alpha = \frac{2 - p}{p - 1}, \quad \beta = \frac{1}{\phi (p - 1) \mu^{p - 1}}. In particular, P(Y=0)=eλP(Y = 0) = e^{-\lambda} is strictly positive, and conditional on Y>0Y > 0, YY has a continuous density on (0,)(0, \infty).

Intuition

Think of an insurance portfolio in a one-year period. Each policyholder either has zero claims (with positive probability) or has a positive number of claims, each of positive Gamma-distributed size. The total annual claim amount YY is exactly zero on policies with no claims and Gamma-sum-shaped on policies with at least one. The Tweedie distribution captures both regimes in a single parametric family.

Proof Sketch

Compute the MGF of Y=i=1NZiY = \sum_{i=1}^N Z_i by conditioning on NN: E[etY]=E[E[etYN]]=E ⁣[(ββt)αN]=eλ(MZ(t)1),E[e^{tY}] = E[E[e^{tY} \mid N]] = E\!\left[ \left( \frac{\beta}{\beta - t}\right)^{\alpha N} \right] = e^{\lambda(M_Z(t) - 1)}, where MZ(t)=(β/(βt))αM_Z(t) = (\beta/(\beta - t))^\alpha is the Gamma MGF. Matching the cumulant function b(θ)=logE[eθY]b(\theta) = \log E[e^{\theta Y}] to the Tweedie cumulant identified in the proof above gives the stated parameter map. The boundary cases p=1p = 1 (pure Poisson with α\alpha \to \infty and Gamma collapsing to a point mass at 11) and p=2p = 2 (pure Gamma with λ\lambda \to \infty, no zero mass) are continuity arguments.

Why It Matters

This is why the Tweedie distribution is the workshop tool for any quantity that is exactly zero with positive probability and otherwise positive and right-skewed. Insurance claim severity by policy. Rainfall by day in a dry region. Health-care expenditure by individual in a year. The compound-Poisson-Gamma decomposition makes the model interpretable (claim frequency, claim size, both as separate parameters) while keeping the unified Tweedie GLM tractable.

Failure Mode

The density on (0,)(0, \infty) does not have a closed-form expression for general p(1,2)p \in (1, 2) and must be computed by series expansion or saddlepoint approximation. Statistical software handles this; do not try to write the density yourself for production use.

Common Confusions

Watch Out

Tweedie distribution is not the Tweedie formula

The Tweedie distribution (this page) is the EDM family with power variance function. The Tweedie formula, sometimes also called the Robbins-Tweedie formula, is an empirical-Bayes identity that expresses the posterior mean as E[μY]=Y+σ2logfY(Y)E[\mu \mid Y] = Y + \sigma^2 \nabla \log f_Y(Y) for a Gaussian likelihood with prior μg\mu \sim g. The two share only the surname; the formula appears in score matching and denoising diffusion (see score matching line 310 for the formula in that context), the distribution appears in GLMs and actuarial pricing.

Watch Out

The gap p in 0 to 1 is mathematical, not a software bug

The Tweedie family has no member for p(0,1)p \in (0, 1). This is a genuine restriction of the EDM structure, not an implementation choice in tweedie or similar packages. Asking for a Tweedie GLM with p=0.5p = 0.5 will produce a software error or silently fall back to a nearby admissible power; check the package documentation.

Watch Out

The dispersion phi and the power p are both parameters

A Tweedie GLM has two parameters beyond the mean structure: the dispersion ϕ\phi (analogous to variance in a Gaussian GLM) and the power pp. Both must be estimated or fixed. In actuarial pricing, pp is often profiled over {1.1,1.2,,1.9}\{1.1, 1.2, \ldots, 1.9\} and chosen by AIC.

Watch Out

Compound Poisson-Gamma is not zero-inflated Gamma

A zero-inflated Gamma is a mixture: a Bernoulli draw for "zero or positive", then a Gamma if positive. A compound Poisson-Gamma has a structural reason for the zero mass: the underlying claim count is Poisson, so zero claims happens with probability eλe^{-\lambda}. The two models are different and produce different inferences when claims-per-policy can exceed one; the Tweedie distribution corresponds to the second.

Exercises

ExerciseCore

Problem

Verify that the Tweedie variance function V(μ)=μpV(\mu) = \mu^p specializes correctly to the Normal, Poisson, and Gamma cases.

ExerciseAdvanced

Problem

For YTweedie(μ,ϕ,p)Y \sim \text{Tweedie}(\mu, \phi, p) with p(1,2)p \in (1, 2), compute P(Y=0)P(Y = 0) in terms of μ,ϕ,p\mu, \phi, p using the compound Poisson-Gamma representation.

References

Canonical:

  • Tweedie, "An index which distinguishes between some important exponential families" (in Statistics: Applications and New Directions, Proceedings of the Indian Statistical Institute Golden Jubilee International Conference, 1984), pages 579-604
  • Jorgensen, "Exponential dispersion models" (Journal of the Royal Statistical Society, Series B, 1987), volume 49, pages 127-162
  • Jorgensen, The Theory of Dispersion Models (1997). The monograph treatment.
  • Smyth, "Regression analysis of quantity data with exact zeros" (Proceedings of the Second Australia-Japan Workshop on Stochastic Models, 1996), pages 572-580. The GLM treatment of the 1<p<21 < p < 2 case.

Actuarial applications:

  • Frees, Derrig, and Meyers (eds.), Predictive Modeling Applications in Actuarial Science, Volume 1 (2014), Chapters 5 and 8
  • See also the ActuaryPath topic page on compound Poisson and Tweedie for the applied pricing treatment.

Next Topics

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Derived topics

0

No published topic currently declares this as a prerequisite.