Scale, Location, and Shape Parameters

Sneiderman, Robby

Foundations

Scale, Location, and Shape Parameters

Three roles a parameter can play in a distribution family: location shifts the support, scale stretches it, and shape changes the form. Conventions vary by source (rate vs scale, especially in Exponential and Gamma), and the group structure of location-scale families is what makes standardization and pivot quantities work.

ImportantCoreTier 2StableSupporting~30 min

For:MLStatsActuarialGeneral

Prerequisites

Common Probability Distributions Expectation Variance Covariance Moments

Prereq Map

Learning position

Read this page in the graph.

foundations | layer 0A | tier 2. This page has 2 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Moment Generating Functions

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Source-grounded page

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

A reader who has met the Normal distribution as $\mathcal N(\mu, \sigma^2)$ has already met all three parameter roles, even if no textbook labeled them. The Normal has a location parameter $\mu$ that slides the bell along the real line, a scale parameter $\sigma$ that stretches it, and (uniquely among distributions) no shape parameter. The Gamma family adds the third role: $k$ is a shape parameter, and changing it does something neither shifting nor stretching can reproduce.

Naming the roles matters for two reasons. First, the same one-letter Greek symbol means different things in different books. Casella and Berger write the Exponential as $\mathrm{Exp}(\beta)$ where $\beta$ is the scale (mean $\beta$ , density $\beta^{-1} e^{-x/\beta}$ ). Klugman, Panjer, and Willmot, and most actuarial sources, write the same distribution but with $\theta$ for the scale. A statistics-track student reading a textbook that uses $\lambda$ for the rate ( $\mathrm{Exp}(\lambda)$ with density $\lambda e^{-\lambda x}$ ) sees the same family with $\beta = 1/\lambda$ . Errors creep in when rate and scale are conflated. Second, the group structure of location-scale families is what makes standardization $(X - \mu)/\sigma$ behave like a universal recipe, and what makes pivot statistics and t-tests work.

Quick Version

Role	Effect on the density	Example
Location $\mu$	Shifts the density horizontally: $f(x; \mu) = f_0(x - \mu)$	Normal mean, Cauchy median, Uniform midpoint
Scale $\sigma$	Stretches the density: $f(x; \sigma) = \sigma^{-1} f_0(x / \sigma)$	Normal SD, Exponential scale, Cauchy spread
Shape	Changes the form, not just location or scale	Gamma $k$ , Weibull $k$ , Pareto $\alpha$ , $t_\nu$ degrees of freedom

A distribution can have any subset. The Exponential has scale only. The Cauchy has location and scale. The Gamma and Weibull have shape and scale. The Generalized Pareto has all three.

Core Definitions

Definition

Location Parameter $μ$

A parameter $\mu \in \mathbb{R}$ is a location parameter of a family $\{f(x; \mu) : \mu \in \mathbb{R}\}$ iff $f(x; \mu) = f_0(x - \mu)$ for some base density $f_0$ . Equivalently, if $X_0$ has density $f_0$ , then $X = X_0 + \mu$ has density $f(\cdot; \mu)$ .

The CDF translates: $F(x; \mu) = F_0(x - \mu)$ . Quantiles shift by $\mu$ . The mean (when it exists) shifts by $\mu$ ; the variance, skewness, and kurtosis are unchanged.

Definition

Scale Parameter $σ$

A parameter $\sigma > 0$ is a scale parameter of a family $\{f(x; \sigma) : \sigma > 0\}$ iff $f(x; \sigma) = \sigma^{-1} f_0(x / \sigma)$ for some base density $f_0$ . Equivalently, if $X_0$ has density $f_0$ , then $X = \sigma X_0$ has density $f(\cdot; \sigma)$ .

The CDF stretches: $F(x; \sigma) = F_0(x / \sigma)$ . The mean and median (when finite) scale by $\sigma$ ; the variance scales by $\sigma^2$ . The standardized moments (skewness, kurtosis) are unchanged because they are scale-invariant by construction.

Definition

Shape Parameter

A parameter is a shape parameter iff changing it alters the family in a way that cannot be reproduced by any combination of location shift and scale rescaling. Two members of a family with different shape values are not affine transformations of each other.

Standardized moments (skewness, kurtosis, higher cumulant ratios) are functions of shape only. The Gamma family with shape $k$ has skewness $2/\sqrt{k}$ regardless of scale; the Student-t with $\nu$ degrees of freedom has kurtosis $3 + 6/(\nu - 4)$ for $\nu > 4$ , also a shape-only function.

Definition

Location-Scale Family

A two-parameter family $\{f(x; \mu, \sigma) : \mu \in \mathbb{R}, \sigma > 0\}$ is a location-scale family generated by base density $f_0$ iff $f(x; \mu, \sigma) = \sigma^{-1} f_0\!\left(\frac{x - \mu}{\sigma}\right).$ Equivalently, if $X_0 \sim f_0$ , then $X = \sigma X_0 + \mu$ has density $f(\cdot; \mu, \sigma)$ .

Examples: Normal, Cauchy, Uniform on $[a, b]$ (parameterized as midpoint $\pm$ half-width), Logistic, Laplace, Student-t with fixed $\nu$ . The Exponential is scale only (no location shift keeps the support on $[0, \infty)$ ). The Gamma is shape-scale, not location-scale.

The Group Structure

Location-scale families are closed under affine transformations of the random variable. This is what makes standardization universal.

Proposition

Affine Closure of Location-Scale Families

Statement

Let $X \sim f(\cdot; \mu, \sigma)$ where $f(x; \mu, \sigma) = \sigma^{-1} f_0((x - \mu)/\sigma)$ . For any $a \in \mathbb{R}$ and $b > 0$ , the affine transformation $Y = a + b X$ satisfies $Y \sim f(\cdot; a + b\mu, b\sigma)$ . The family is therefore closed under positive affine maps.

In particular, the standardized variable $Z = (X - \mu)/\sigma$ has density $f_0$ and does not depend on $\mu$ or $\sigma$ .

Intuition

Shifting and stretching a member of a location-scale family produces another member of the same family. The standardized base density $f_0$ is a fixed point of this group action: every member of the family is a translate-and-stretch of $f_0$ . This is what licenses the universal recipe "standardize first, then look up the tail probability in a table."

Proof Sketch

Change of variables. If $Y = a + b X$ , then $X = (Y - a)/b$ and $|dX/dY| = 1/b$ . So the density of $Y$ is $f_Y(y) = b^{-1} f(b^{-1}(y - a); \mu, \sigma) = b^{-1} \sigma^{-1} f_0\!\big((b^{-1}(y - a) - \mu)/\sigma\big) = (b\sigma)^{-1} f_0\!\big((y - (a + b\mu))/(b\sigma)\big).$ This is $f(\cdot; a + b\mu, b\sigma)$ .

Why It Matters

Standardization $(X - \mu)/\sigma$ produces a pivot quantity: a function of data and parameters whose distribution does not depend on the parameters. Pivots are what confidence intervals are built from. The classical t-statistic is a pivot for the location-scale Normal family. The same construction does not work for shape families (Gamma, Weibull) because there is no fixed base member to standardize against.

Failure Mode

The closure property requires the affine map to be positive ( $b > 0$ ). For symmetric base densities ( $f_0(-x) = f_0(x)$ ), a sign flip $b < 0$ still keeps you in the family. For asymmetric base densities (like a skewed Logistic), a sign flip lands you outside.

report a correction →

Parameter Conventions: Rate vs Scale

The single most common source of confusion. Different texts parameterize the same distribution differently, and the parameter symbol can mean either the rate or its reciprocal.

Distribution	Rate convention	Scale convention	Relation
Exponential	$\mathrm{Exp}(\lambda)$ , density $\lambda e^{-\lambda x}$ , mean $1/\lambda$	$\mathrm{Exp}(\theta)$ or $\mathrm{Exp}(\beta)$ , density $\theta^{-1} e^{-x/\theta}$ , mean $\theta$	$\theta = 1/\lambda$
Gamma	$\mathrm{Gamma}(\alpha, \beta)$ rate, mean $\alpha/\beta$	$\mathrm{Gamma}(k, \theta)$ shape-scale, mean $k\theta$	$\theta = 1/\beta$ , $k = \alpha$
Inverse Gamma	rate $\beta$	scale $\theta$	$\theta = 1/\beta$
Weibull	(rare)	$\mathrm{Weibull}(k, \theta)$ shape-scale, density $(k/\theta)(x/\theta)^{k-1} e^{-(x/\theta)^k}$	scale convention dominates

Watch Out

Rate and scale are reciprocals. Read the density before you trust the symbol.

A textbook that writes "let $X \sim \mathrm{Exp}(\lambda)$ " is using either rate or scale depending on the book. Casella-Berger uses scale $\beta$ with mean $\beta$ . Klugman uses scale $\theta$ with mean $\theta$ . Many introductory statistics texts use rate $\lambda$ with mean $1/\lambda$ . The density tells you which: if the parameter appears as a coefficient out front, it is the rate; if it appears as the divisor inside the exponent, it is the scale.

Watch Out

Inverse Gamma scale is not the inverse of Gamma scale

The inverse-Gamma distribution $\mathrm{InvGamma}(\alpha, \theta)$ is the distribution of $1/X$ where $X \sim \mathrm{Gamma}(\alpha, 1/\theta)$ . The scale parameter $\theta$ of $\mathrm{InvGamma}$ is the scale of the reciprocated variable, not the reciprocal of the Gamma scale. Easy to misread.

Shape-Only Quantities

Standardized moments are functions of shape parameters only. Two members of the same family with different location or scale share the same skewness, kurtosis, and higher standardized cumulants.

Distribution	Skewness	Excess kurtosis	Notes
Normal	$0$	$0$	Shape-free; all higher cumulants vanish
Exponential	$2$	$6$	Scale-only family; skewness and kurtosis are constants
Gamma $(k, \theta)$	$2/\sqrt{k}$	$6/k$	Both depend on shape $k$ only
Student- $t_\nu$	$0$ for $\nu > 3$	$6/(\nu - 4)$ for $\nu > 4$	Shape is $\nu$ ; location-scale extensions add $\mu, \sigma$
Weibull $(k, \theta)$	function of $k$	function of $k$	Both depend on shape $k$ only

This is the practical version of the rule "shape parameters carry the genuinely new information." Once you have computed a standardized moment, you have computed a function of shape.

The Cauchy Case

The Cauchy distribution $\mathrm{Cauchy}(x_0, \gamma)$ is a location-scale family. It has location $x_0$ (the median, since the mean does not exist) and scale $\gamma$ (the half-width at half-maximum, since the variance does not exist). Standardization $(X - x_0)/\gamma$ produces a standard Cauchy with density $1/(\pi(1 + z^2))$ .

The lesson: location and scale parameters can exist even when the corresponding moment (mean for location, variance for scale) does not. Treat them as parameters of the family, not as moments of the variable.

Examples by Distribution

Normal $\mathcal N(\mu, \sigma^2)$ : location $\mu$ , scale $\sigma$ , no shape.
Exponential $\mathrm{Exp}(\theta)$ : scale $\theta$ , no location, no shape. Constant skewness $2$ and excess kurtosis $6$ .
Gamma $\mathrm{Gamma}(k, \theta)$ : shape $k$ , scale $\theta$ , no location. Setting $k = 1$ recovers Exponential. Sum of $k$ i.i.d. $\mathrm{Exp}(\theta)$ when $k$ is a positive integer.
Weibull $\mathrm{Weibull}(k, \theta)$ : shape $k$ , scale $\theta$ . Setting $k = 1$ recovers Exponential; $k = 2$ recovers Rayleigh.
Lognormal: location and scale parameters live on the log scale ( $\mu, \sigma$ are the mean and SD of $\log X$ , not of $X$ ). Sometimes called "log-location" and "log-scale" to avoid confusion.
Pareto $\mathrm{Pareto}(x_m, \alpha)$ : scale $x_m$ (the minimum), shape $\alpha$ (tail index). Heavy tail controlled by shape; $k$ -th moment exists iff $\alpha > k$ .
Generalized Pareto $\mathrm{GPD}(\mu, \sigma, \xi)$ : location $\mu$ , scale $\sigma$ , shape $\xi$ . The shape $\xi$ controls tail behavior: $\xi = 0$ is exponential tail, $\xi > 0$ is heavy tail, $\xi < 0$ has a bounded right endpoint.
Student- $t_\nu$ : shape $\nu$ (degrees of freedom). Location-scale extension $\mu + \sigma T_\nu$ adds the two remaining roles; $\nu$ controls tail weight.
Cauchy $\mathrm{Cauchy}(x_0, \gamma)$ : location $x_0$ , scale $\gamma$ . The $\nu = 1$ case of Student- $t$ .

Common Confusions

Watch Out

Scale is not standard deviation in general

The Normal is the special case where the scale parameter $\sigma$ equals the standard deviation. For most other distributions, the scale parameter does not equal the SD. The Exponential with scale $\theta$ has SD $= \theta$ (here they happen to coincide), but the Gamma $(k, \theta)$ has SD $\theta \sqrt{k}$ — the SD is the scale times a shape-dependent constant. For the Cauchy, the SD does not exist at all; the scale parameter $\gamma$ is a half-width, not a standard deviation.

Watch Out

Standardization works for location-scale families, not shape families

Subtracting the mean and dividing by the SD always produces a variable with mean 0 and variance 1, provided both exist. But the distribution of the standardized variable depends on the original shape parameter unless the family is location-scale. Standardizing two different Gamma $(k, \theta)$ samples with different $k$ gives two different distributions with mean 0 and variance 1; that is why pivot quantities and tabulated tail tables work for the Normal but not for the Gamma.

Watch Out

A parameter symbol does not commit to a role

Greek letters carry no semantic load. The same $\theta$ can be the location for one distribution, the scale for another, the shape for a third. Resolve the role by inspecting how the parameter enters the density, not by assuming the name.

Exercises

ExerciseCore

Problem

Let $X \sim \mathrm{Exp}(\lambda)$ in the rate parameterization, so the density is $f_X(x) = \lambda e^{-\lambda x}$ for $x \geq 0$ . Show that $Y = X/\lambda$ has the density of a standard exponential ( $\mathrm{Exp}(1)$ in the rate parameterization). Then express the same $Y$ in terms of $\beta = 1/\lambda$ as the standardized version of $X \sim \mathrm{Exp}(\beta)$ in the scale parameterization.

ExerciseCore

Problem

The Gamma $(k, \theta)$ distribution (shape-scale parameterization) has mean $k\theta$ and variance $k\theta^2$ . Compute the coefficient of variation $\mathrm{CV} = \mathrm{SD}/\mathrm{mean}$ . Which parameter does it depend on?

ExerciseAdvanced

Problem

Suppose $X \sim \mathrm{Cauchy}(x_0, \gamma)$ with location $x_0$ and scale $\gamma$ . Show that $Y = (X - x_0)/\gamma$ has the standard Cauchy density $f_Y(y) = 1/(\pi(1 + y^2))$ . Then show that the sample mean $\bar X_n$ of $n$ i.i.d. Cauchy $(x_0, \gamma)$ variables is distributed as $\mathrm{Cauchy}(x_0, \gamma)$ — the scale does not shrink with $n$ .

References

Canonical:

Casella & Berger, Statistical Inference (2nd ed., 2002), Chapter 3.5 (location-scale families)
Lehmann & Casella, Theory of Point Estimation (2nd ed., 1998), Chapter 3 (invariance and equivariance under location-scale groups)

Current:

Klugman, Panjer & Willmot, Loss Models: From Data to Decisions (5th ed., 2019), Chapters 5-6 (actuarial scale convention and parameter inventory)
Johnson, Kotz & Balakrishnan, Continuous Univariate Distributions, Volume 1 (2nd ed., 1994), Chapters 13, 17, 19 (Exponential, Gamma, Weibull parameterization conventions)

Next Topics

Moment generating functions: how location and scale enter the MGF cleanly via $M_{\sigma X + \mu}(t) = e^{t\mu} M_X(\sigma t)$
Method of moments: estimating location, scale, and shape from sample moments
Multivariate distributions atlas: how the location-scale framework generalizes to vector and matrix valued laws

Last reviewed: May 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Common Probability Distributionslayer 0A · tier 1
Expectation, Variance, Covariance, and Momentslayer 0A · tier 1

Derived topics

0

No published topic currently declares this as a prerequisite.