Skip to main content

Foundations

Scale, Location, and Shape Parameters

Three roles a parameter can play in a distribution family: location shifts the support, scale stretches it, and shape changes the form. Conventions vary by source (rate vs scale, especially in Exponential and Gamma), and the group structure of location-scale families is what makes standardization and pivot quantities work.

ImportantCoreTier 2StableSupporting~30 min
For:MLStatsActuarialGeneral

Why This Matters

A reader who has met the Normal distribution as N(μ,σ2)\mathcal N(\mu, \sigma^2) has already met all three parameter roles, even if no textbook labeled them. The Normal has a location parameter μ\mu that slides the bell along the real line, a scale parameter σ\sigma that stretches it, and (uniquely among distributions) no shape parameter. The Gamma family adds the third role: kk is a shape parameter, and changing it does something neither shifting nor stretching can reproduce.

Naming the roles matters for two reasons. First, the same one-letter Greek symbol means different things in different books. Casella and Berger write the Exponential as Exp(β)\mathrm{Exp}(\beta) where β\beta is the scale (mean β\beta, density β1ex/β\beta^{-1} e^{-x/\beta}). Klugman, Panjer, and Willmot, and most actuarial sources, write the same distribution but with θ\theta for the scale. A statistics-track student reading a textbook that uses λ\lambda for the rate (Exp(λ)\mathrm{Exp}(\lambda) with density λeλx\lambda e^{-\lambda x}) sees the same family with β=1/λ\beta = 1/\lambda. Errors creep in when rate and scale are conflated. Second, the group structure of location-scale families is what makes standardization (Xμ)/σ(X - \mu)/\sigma behave like a universal recipe, and what makes pivot statistics and t-tests work.

Quick Version

RoleEffect on the densityExample
Location μ\muShifts the density horizontally: f(x;μ)=f0(xμ)f(x; \mu) = f_0(x - \mu)Normal mean, Cauchy median, Uniform midpoint
Scale σ\sigmaStretches the density: f(x;σ)=σ1f0(x/σ)f(x; \sigma) = \sigma^{-1} f_0(x / \sigma)Normal SD, Exponential scale, Cauchy spread
ShapeChanges the form, not just location or scaleGamma kk, Weibull kk, Pareto α\alpha, tνt_\nu degrees of freedom

A distribution can have any subset. The Exponential has scale only. The Cauchy has location and scale. The Gamma and Weibull have shape and scale. The Generalized Pareto has all three.

Core Definitions

Definition

Location Parameter

A parameter μR\mu \in \mathbb{R} is a location parameter of a family {f(x;μ):μR}\{f(x; \mu) : \mu \in \mathbb{R}\} iff f(x;μ)=f0(xμ)f(x; \mu) = f_0(x - \mu) for some base density f0f_0. Equivalently, if X0X_0 has density f0f_0, then X=X0+μX = X_0 + \mu has density f(;μ)f(\cdot; \mu).

The CDF translates: F(x;μ)=F0(xμ)F(x; \mu) = F_0(x - \mu). Quantiles shift by μ\mu. The mean (when it exists) shifts by μ\mu; the variance, skewness, and kurtosis are unchanged.

Definition

Scale Parameter

A parameter σ>0\sigma > 0 is a scale parameter of a family {f(x;σ):σ>0}\{f(x; \sigma) : \sigma > 0\} iff f(x;σ)=σ1f0(x/σ)f(x; \sigma) = \sigma^{-1} f_0(x / \sigma) for some base density f0f_0. Equivalently, if X0X_0 has density f0f_0, then X=σX0X = \sigma X_0 has density f(;σ)f(\cdot; \sigma).

The CDF stretches: F(x;σ)=F0(x/σ)F(x; \sigma) = F_0(x / \sigma). The mean and median (when finite) scale by σ\sigma; the variance scales by σ2\sigma^2. The standardized moments (skewness, kurtosis) are unchanged because they are scale-invariant by construction.

Definition

Shape Parameter

A parameter is a shape parameter iff changing it alters the family in a way that cannot be reproduced by any combination of location shift and scale rescaling. Two members of a family with different shape values are not affine transformations of each other.

Standardized moments (skewness, kurtosis, higher cumulant ratios) are functions of shape only. The Gamma family with shape kk has skewness 2/k2/\sqrt{k} regardless of scale; the Student-t with ν\nu degrees of freedom has kurtosis 3+6/(ν4)3 + 6/(\nu - 4) for ν>4\nu > 4, also a shape-only function.

Definition

Location-Scale Family

A two-parameter family {f(x;μ,σ):μR,σ>0}\{f(x; \mu, \sigma) : \mu \in \mathbb{R}, \sigma > 0\} is a location-scale family generated by base density f0f_0 iff f(x;μ,σ)=σ1f0 ⁣(xμσ).f(x; \mu, \sigma) = \sigma^{-1} f_0\!\left(\frac{x - \mu}{\sigma}\right). Equivalently, if X0f0X_0 \sim f_0, then X=σX0+μX = \sigma X_0 + \mu has density f(;μ,σ)f(\cdot; \mu, \sigma).

Examples: Normal, Cauchy, Uniform on [a,b][a, b] (parameterized as midpoint ±\pm half-width), Logistic, Laplace, Student-t with fixed ν\nu. The Exponential is scale only (no location shift keeps the support on [0,)[0, \infty)). The Gamma is shape-scale, not location-scale.

The Group Structure

Location-scale families are closed under affine transformations of the random variable. This is what makes standardization universal.

Proposition

Affine Closure of Location-Scale Families

Statement

Let Xf(;μ,σ)X \sim f(\cdot; \mu, \sigma) where f(x;μ,σ)=σ1f0((xμ)/σ)f(x; \mu, \sigma) = \sigma^{-1} f_0((x - \mu)/\sigma). For any aRa \in \mathbb{R} and b>0b > 0, the affine transformation Y=a+bXY = a + b X satisfies Yf(;a+bμ,bσ)Y \sim f(\cdot; a + b\mu, b\sigma). The family is therefore closed under positive affine maps.

In particular, the standardized variable Z=(Xμ)/σZ = (X - \mu)/\sigma has density f0f_0 and does not depend on μ\mu or σ\sigma.

Intuition

Shifting and stretching a member of a location-scale family produces another member of the same family. The standardized base density f0f_0 is a fixed point of this group action: every member of the family is a translate-and-stretch of f0f_0. This is what licenses the universal recipe "standardize first, then look up the tail probability in a table."

Proof Sketch

Change of variables. If Y=a+bXY = a + b X, then X=(Ya)/bX = (Y - a)/b and dX/dY=1/b|dX/dY| = 1/b. So the density of YY is fY(y)=b1f(b1(ya);μ,σ)=b1σ1f0 ⁣((b1(ya)μ)/σ)=(bσ)1f0 ⁣((y(a+bμ))/(bσ)).f_Y(y) = b^{-1} f(b^{-1}(y - a); \mu, \sigma) = b^{-1} \sigma^{-1} f_0\!\big((b^{-1}(y - a) - \mu)/\sigma\big) = (b\sigma)^{-1} f_0\!\big((y - (a + b\mu))/(b\sigma)\big). This is f(;a+bμ,bσ)f(\cdot; a + b\mu, b\sigma).

Why It Matters

Standardization (Xμ)/σ(X - \mu)/\sigma produces a pivot quantity: a function of data and parameters whose distribution does not depend on the parameters. Pivots are what confidence intervals are built from. The classical t-statistic is a pivot for the location-scale Normal family. The same construction does not work for shape families (Gamma, Weibull) because there is no fixed base member to standardize against.

Failure Mode

The closure property requires the affine map to be positive (b>0b > 0). For symmetric base densities (f0(x)=f0(x)f_0(-x) = f_0(x)), a sign flip b<0b < 0 still keeps you in the family. For asymmetric base densities (like a skewed Logistic), a sign flip lands you outside.

Parameter Conventions: Rate vs Scale

The single most common source of confusion. Different texts parameterize the same distribution differently, and the parameter symbol can mean either the rate or its reciprocal.

DistributionRate conventionScale conventionRelation
ExponentialExp(λ)\mathrm{Exp}(\lambda), density λeλx\lambda e^{-\lambda x}, mean 1/λ1/\lambdaExp(θ)\mathrm{Exp}(\theta) or Exp(β)\mathrm{Exp}(\beta), density θ1ex/θ\theta^{-1} e^{-x/\theta}, mean θ\thetaθ=1/λ\theta = 1/\lambda
GammaGamma(α,β)\mathrm{Gamma}(\alpha, \beta) rate, mean α/β\alpha/\betaGamma(k,θ)\mathrm{Gamma}(k, \theta) shape-scale, mean kθk\thetaθ=1/β\theta = 1/\beta, k=αk = \alpha
Inverse Gammarate β\betascale θ\thetaθ=1/β\theta = 1/\beta
Weibull(rare)Weibull(k,θ)\mathrm{Weibull}(k, \theta) shape-scale, density (k/θ)(x/θ)k1e(x/θ)k(k/\theta)(x/\theta)^{k-1} e^{-(x/\theta)^k}scale convention dominates
Watch Out

Rate and scale are reciprocals. Read the density before you trust the symbol.

A textbook that writes "let XExp(λ)X \sim \mathrm{Exp}(\lambda)" is using either rate or scale depending on the book. Casella-Berger uses scale β\beta with mean β\beta. Klugman uses scale θ\theta with mean θ\theta. Many introductory statistics texts use rate λ\lambda with mean 1/λ1/\lambda. The density tells you which: if the parameter appears as a coefficient out front, it is the rate; if it appears as the divisor inside the exponent, it is the scale.

Watch Out

Inverse Gamma scale is not the inverse of Gamma scale

The inverse-Gamma distribution InvGamma(α,θ)\mathrm{InvGamma}(\alpha, \theta) is the distribution of 1/X1/X where XGamma(α,1/θ)X \sim \mathrm{Gamma}(\alpha, 1/\theta). The scale parameter θ\theta of InvGamma\mathrm{InvGamma} is the scale of the reciprocated variable, not the reciprocal of the Gamma scale. Easy to misread.

Shape-Only Quantities

Standardized moments are functions of shape parameters only. Two members of the same family with different location or scale share the same skewness, kurtosis, and higher standardized cumulants.

DistributionSkewnessExcess kurtosisNotes
Normal0000Shape-free; all higher cumulants vanish
Exponential2266Scale-only family; skewness and kurtosis are constants
Gamma(k,θ)(k, \theta)2/k2/\sqrt{k}6/k6/kBoth depend on shape kk only
Student-tνt_\nu00 for ν>3\nu > 36/(ν4)6/(\nu - 4) for ν>4\nu > 4Shape is ν\nu; location-scale extensions add μ,σ\mu, \sigma
Weibull(k,θ)(k, \theta)function of kkfunction of kkBoth depend on shape kk only

This is the practical version of the rule "shape parameters carry the genuinely new information." Once you have computed a standardized moment, you have computed a function of shape.

The Cauchy Case

The Cauchy distribution Cauchy(x0,γ)\mathrm{Cauchy}(x_0, \gamma) is a location-scale family. It has location x0x_0 (the median, since the mean does not exist) and scale γ\gamma (the half-width at half-maximum, since the variance does not exist). Standardization (Xx0)/γ(X - x_0)/\gamma produces a standard Cauchy with density 1/(π(1+z2))1/(\pi(1 + z^2)).

The lesson: location and scale parameters can exist even when the corresponding moment (mean for location, variance for scale) does not. Treat them as parameters of the family, not as moments of the variable.

Examples by Distribution

  • Normal N(μ,σ2)\mathcal N(\mu, \sigma^2): location μ\mu, scale σ\sigma, no shape.
  • Exponential Exp(θ)\mathrm{Exp}(\theta): scale θ\theta, no location, no shape. Constant skewness 22 and excess kurtosis 66.
  • Gamma Gamma(k,θ)\mathrm{Gamma}(k, \theta): shape kk, scale θ\theta, no location. Setting k=1k = 1 recovers Exponential. Sum of kk i.i.d. Exp(θ)\mathrm{Exp}(\theta) when kk is a positive integer.
  • Weibull Weibull(k,θ)\mathrm{Weibull}(k, \theta): shape kk, scale θ\theta. Setting k=1k = 1 recovers Exponential; k=2k = 2 recovers Rayleigh.
  • Lognormal: location and scale parameters live on the log scale (μ,σ\mu, \sigma are the mean and SD of logX\log X, not of XX). Sometimes called "log-location" and "log-scale" to avoid confusion.
  • Pareto Pareto(xm,α)\mathrm{Pareto}(x_m, \alpha): scale xmx_m (the minimum), shape α\alpha (tail index). Heavy tail controlled by shape; kk-th moment exists iff α>k\alpha > k.
  • Generalized Pareto GPD(μ,σ,ξ)\mathrm{GPD}(\mu, \sigma, \xi): location μ\mu, scale σ\sigma, shape ξ\xi. The shape ξ\xi controls tail behavior: ξ=0\xi = 0 is exponential tail, ξ>0\xi > 0 is heavy tail, ξ<0\xi < 0 has a bounded right endpoint.
  • Student-tνt_\nu: shape ν\nu (degrees of freedom). Location-scale extension μ+σTν\mu + \sigma T_\nu adds the two remaining roles; ν\nu controls tail weight.
  • Cauchy Cauchy(x0,γ)\mathrm{Cauchy}(x_0, \gamma): location x0x_0, scale γ\gamma. The ν=1\nu = 1 case of Student-tt.

Common Confusions

Watch Out

Scale is not standard deviation in general

The Normal is the special case where the scale parameter σ\sigma equals the standard deviation. For most other distributions, the scale parameter does not equal the SD. The Exponential with scale θ\theta has SD =θ= \theta (here they happen to coincide), but the Gamma(k,θ)(k, \theta) has SD θk\theta \sqrt{k} — the SD is the scale times a shape-dependent constant. For the Cauchy, the SD does not exist at all; the scale parameter γ\gamma is a half-width, not a standard deviation.

Watch Out

Standardization works for location-scale families, not shape families

Subtracting the mean and dividing by the SD always produces a variable with mean 0 and variance 1, provided both exist. But the distribution of the standardized variable depends on the original shape parameter unless the family is location-scale. Standardizing two different Gamma(k,θ)(k, \theta) samples with different kk gives two different distributions with mean 0 and variance 1; that is why pivot quantities and tabulated tail tables work for the Normal but not for the Gamma.

Watch Out

A parameter symbol does not commit to a role

Greek letters carry no semantic load. The same θ\theta can be the location for one distribution, the scale for another, the shape for a third. Resolve the role by inspecting how the parameter enters the density, not by assuming the name.

Exercises

ExerciseCore

Problem

Let XExp(λ)X \sim \mathrm{Exp}(\lambda) in the rate parameterization, so the density is fX(x)=λeλxf_X(x) = \lambda e^{-\lambda x} for x0x \geq 0. Show that Y=X/λY = X/\lambda has the density of a standard exponential (Exp(1)\mathrm{Exp}(1) in the rate parameterization). Then express the same YY in terms of β=1/λ\beta = 1/\lambda as the standardized version of XExp(β)X \sim \mathrm{Exp}(\beta) in the scale parameterization.

ExerciseCore

Problem

The Gamma(k,θ)(k, \theta) distribution (shape-scale parameterization) has mean kθk\theta and variance kθ2k\theta^2. Compute the coefficient of variation CV=SD/mean\mathrm{CV} = \mathrm{SD}/\mathrm{mean}. Which parameter does it depend on?

ExerciseAdvanced

Problem

Suppose XCauchy(x0,γ)X \sim \mathrm{Cauchy}(x_0, \gamma) with location x0x_0 and scale γ\gamma. Show that Y=(Xx0)/γY = (X - x_0)/\gamma has the standard Cauchy density fY(y)=1/(π(1+y2))f_Y(y) = 1/(\pi(1 + y^2)). Then show that the sample mean Xˉn\bar X_n of nn i.i.d. Cauchy(x0,γ)(x_0, \gamma) variables is distributed as Cauchy(x0,γ)\mathrm{Cauchy}(x_0, \gamma) — the scale does not shrink with nn.

References

Canonical:

  • Casella & Berger, Statistical Inference (2nd ed., 2002), Chapter 3.5 (location-scale families)
  • Lehmann & Casella, Theory of Point Estimation (2nd ed., 1998), Chapter 3 (invariance and equivariance under location-scale groups)

Current:

  • Klugman, Panjer & Willmot, Loss Models: From Data to Decisions (5th ed., 2019), Chapters 5-6 (actuarial scale convention and parameter inventory)
  • Johnson, Kotz & Balakrishnan, Continuous Univariate Distributions, Volume 1 (2nd ed., 1994), Chapters 13, 17, 19 (Exponential, Gamma, Weibull parameterization conventions)

Next Topics

Last reviewed: May 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

2

Derived topics

0

No published topic currently declares this as a prerequisite.