Frequentist vs. Bayesian Decision Theory. Risk, Admissibility, Minimax, Complete Class

What Each Frame Optimizes

Both frameworks share the same machinery: a parameter space $\Theta$ , an action space $\mathcal{A}$ , a loss function $L: \Theta \times \mathcal{A} \to \mathbb{R}$ , and a decision rule $\delta: \mathcal{X} \to \mathcal{A}$ that maps observed data to actions. They differ in how the loss is averaged into a number you can optimize.

Bayesian decision theory places a prior $\pi$ on $\Theta$ and minimizes the Bayes risk — a single scalar, the prior-weighted average of frequentist risk:

r(\pi, \delta) = \mathbb{E}_\pi[R(\theta, \delta)] = \mathbb{E}_\pi\!\left[\mathbb{E}_\theta\!\left[L(\theta, \delta(X))\right]\right].

The Bayes rule $\delta_\pi$ minimizes this scalar, equivalently minimizing the posterior expected loss pointwise at each $x$ :

\delta_\pi(x) = \arg\min_a \mathbb{E}_{\theta \mid x}[L(\theta, a)].

Frequentist decision theory refuses to put a prior on $\Theta$ and instead studies the entire risk function:

\theta \mapsto R(\theta, \delta) = \mathbb{E}_\theta[L(\theta, \delta(X))].

Comparison across rules now means comparing functions on $\Theta$ , not scalars. Two principal criteria narrow the set of rules: admissibility (no rule dominates $\delta$ uniformly across $\Theta$ ) and minimax ( $\delta$ minimizes $\sup_\theta R(\theta, \delta)$ ).

Side-by-Side Statement

Definition

Bayes Rule

Given a prior $\pi$ on $\Theta$ and posterior $\pi(\theta \mid x)$ , the Bayes rule with respect to $\pi$ is

\delta_\pi(x) = \arg\min_a \mathbb{E}_{\theta \mid x}[L(\theta, a)].

Under squared-error loss the Bayes rule is the posterior mean. Under absolute-error loss it is the posterior median. Under 0-1 loss it is the posterior mode (MAP).

Definition

Admissibility

A decision rule $\delta$ is admissible if no $\delta'$ satisfies $R(\theta, \delta') \le R(\theta, \delta)$ for every $\theta$ with strict inequality somewhere. The admissibility set sits on the lower envelope of risk functions in $\Theta$ -space.

Definition

Minimax Rule

$\delta^*$ is minimax if $\delta^* = \arg\min_\delta \sup_\theta R(\theta, \delta)$ . It pessimizes over $\theta$ instead of averaging against a prior.

The two viewpoints connect through Wald's complete-class theorem, the saddle-point identity, and admissibility-via-uniqueness — see the next section.

Where the Two Frames Meet

Three landmark results pin down the geometry of the meeting:

Bayes rule $\Rightarrow$ admissible (under uniqueness). If the Bayes rule with respect to $\pi$ is unique up to almost-everywhere equivalence, then it is admissible. A dominator would have strictly smaller Bayes risk under $\pi$ , contradicting Bayes optimality.
Wald's complete-class theorem (admissible $\Rightarrow$ Bayes-or-limit). Under regularity (compact $\Theta$ , continuous loss and risk), every admissible rule is a Bayes rule for some prior, or a limit of Bayes rules. The "limit" clause covers improper-prior boundary cases.
Minimax via least-favorable prior (saddle-point identity). A standard route to a minimax rule: find a prior $\pi^*$ that maximizes the Bayes risk and the corresponding Bayes rule $\delta_{\pi^*}$ . If

r(\pi^*, \delta_{\pi^*}) = \sup_\theta R(\theta, \delta_{\pi^*}),

then $\delta_{\pi^*}$ is minimax. The identity expresses the saddle-point equilibrium between the statistician (min-player on rules) and nature (max-player on parameters); see minimax and saddle points for the geometry.

The takeaway: under regularity, the admissible class equals the Bayes-rule-or-limit-of-Bayes-rules class, and minimax rules sit inside this class as Bayes rules against least-favorable priors. The two viewpoints are not adversarial; they parameterize the same decision-theoretic frontier from different sides.

When Each Frame Wins

Bayesian wins when a prior is real and informative. If domain knowledge gives a prior with substantive content (e.g. clinical-trial historical controls, hierarchical pooling across small areas), the Bayes rule incorporates it directly and tightens the decision relative to ignoring it. The cost is honesty about the prior: if it is wrong, the Bayes rule inherits the error.

Frequentist wins when the prior is unavailable or contested. Worst-case (minimax) reasoning bounds the loss without committing to a parameter distribution. This is the default in regulatory settings (drug approval, audit), in quality-control thresholds, and anywhere the practitioner cannot defensibly nominate a prior.

Both views agree on admissibility as a minimum bar. A frequentist would not use an inadmissible rule; a Bayesian would not use a rule that is not Bayes for any prior. Wald's theorem makes these the same constraint under regularity.

Where the Two Frames Disagree (in Practice)

The James-Stein paradox. For estimating the mean of a $d$ -dimensional Gaussian under squared-error loss, the sample mean $\bar X$ is the MLE, the maximum-likelihood-frequentist default. In dimensions $d \ge 3$ the James-Stein estimator dominates $\bar X$ uniformly: every $\theta$ has $R(\theta, \delta_{JS}) < R(\theta, \bar X) = d$ . The Bayes-rule view recognizes $\delta_{JS}$ as a generalized Bayes rule (or a limit of Bayes rules) under specific empirical-Bayes priors; the frequentist view registers the dominance fact without needing a prior. Both viewpoints converge on: use shrinkage when $d \ge 3$ if squared-error loss is genuinely the criterion.

Cost-asymmetric classification. Under asymmetric losses $c_{FN} \gg c_{FP}$ (e.g. medical screening), the Bayes-optimal threshold on the posterior $P(Y = 1 \mid x)$ is $c_{FP} / (c_{FP} + c_{FN}) \ll 0.5$ . The frequentist view recovers the same threshold via minimax against a least-favorable prior. The numerical answer agrees; the route differs.

Calibration vs ranking. The Bayesian framework integrates calibration (the posterior is the right object) and ranking (Bayes rule sits on the posterior expected loss) as one operation. The frequentist framework decouples them: AUC measures ranking, proper scoring rules measure calibration, and the two can disagree on a finite sample. The decision-theoretic Bayesian view considers this decoupling artificial.

Numeric Illustration: Bayes Risk vs Frequentist Risk Collapse

Take a normal-mean problem $X \sim N(\theta, 1)$ with squared-error loss and the constant rule $\delta(X) \equiv c$ . Frequentist risk:

R(\theta, \delta) = (\theta - c)^2,

a parabola in $\theta$ . Bayes risk under prior $\pi$ :

r(\pi, \delta) = \mathbb{E}_\pi[(\theta - c)^2] = \mathrm{Var}_\pi(\theta) + (\mathbb{E}_\pi[\theta] - c)^2.

Two cases:

Prior $\pi$	Bayes risk $r(\pi, \delta)$	Frequentist risk $R(\theta_0, \delta)$
Point mass $\delta_{\theta_0}$	$(\theta_0 - c)^2$	$(\theta_0 - c)^2$
$N(0, \tau^2)$	$\tau^2 + c^2$	$(\theta_0 - c)^2$
Improper flat	$+\infty$	$(\theta_0 - c)^2$

The collapse is exact in the point-mass case: Bayes risk equals frequentist risk at $\theta_0$ . For non-degenerate priors the two diverge generically. This is the precise sense in which Bayesian and frequentist viewpoints differ on what counts as "the risk."

Common Confusions

Watch Out

Bayesian and frequentist decision theory contradict each other.

They do not. Wald's complete-class theorem says, under regularity, the admissible (frequentist) rules and the Bayes (Bayesian) rules describe the same set up to limits. The disagreement is about which point on this shared frontier to pick, not about the frontier itself.

Watch Out

The minimax rule is automatically admissible.

Not always. Minimax rules under regularity often are admissible (and equal to Bayes rules against a least-favorable prior). But pathological cases exist where a minimax rule has a tied dominator. Admissibility and minimax are distinct criteria; one can hold without the other.

Watch Out

Putting a prior on $\Theta$ makes a problem Bayesian.

A Bayesian quantifies epistemic uncertainty in $\theta$ by a probability distribution. A frequentist can analyze a Bayes rule (and often does, via Wald's theorem) without committing to that probabilistic interpretation. The Bayesian commitment is to the epistemology, not just to the use of priors.

Watch Out

The James-Stein estimator is a Bayesian rule and therefore not frequentist.

The James-Stein estimator is a frequentist construction (no prior is invoked) whose risk dominates the MLE everywhere on $\Theta$ . It is also a generalized Bayes rule under specific empirical-Bayes priors. The two characterizations coexist; James-Stein illustrates the complete-class theorem rather than supporting one camp.

Diagnostic Quizzes

The two gold sets that anchor this comparison cover both sides of the diptych:

Classification + Bayesian decision theory v1 — 12 questions on the Bayesian half: confusion-matrix definitions, ERM-with-zero-one-loss = empirical misclass rate, F1 harmonic mean, ROC point and AUC, Bayes-optimal classifier as posterior mode, Bayes risk floor, strict propriety of log-loss / Brier, cost-sensitive thresholds, ROC vs PR under imbalance.
Frequentist decision theory v1 — 12 questions on the frequentist half: frequentist risk vs Bayes risk, admissibility as non-domination, constant-estimator admissibility surprise, minimax functional, Bayes rule = posterior expected loss minimizer, Bayes-vs-frequentist-risk collapse to point mass, Stein paradox in $d \ge 3$ , Bayes-rule uniqueness implies admissibility, Wald complete-class theorem, why inadmissibility does not always disqualify, saddle-point identity certifying minimax via least-favorable prior.

Walking both is the cleanest path to the diptych; the frequentist set lists the Bayesian set as a prerequisite.

References

Canonical:

Wald, Statistical Decision Functions (1950). The book that founded statistical decision theory and proved the complete-class theorem.
Berger, Statistical Decision Theory and Bayesian Analysis (1985). The unified textbook treatment; Chapters 1-4 cover the full diptych.
Lehmann & Casella, Theory of Point Estimation (1998), Chapters 1-5. Frequentist canon, with admissibility, minimax, and James-Stein.
Robert, The Bayesian Choice (2007). Bayesian canon, with the frequentist comparisons made explicit.

Current:

Casella & Berger, Statistical Inference (2002), Chapters 7-8. Standard graduate reference.
Murphy, Probabilistic Machine Learning: An Introduction (2022), Chapter 5. Bayesian decision theory, classification metrics, and proper scoring rules.

What Each Frame Optimizes

Side-by-Side Statement

Where the Two Frames Meet

When Each Frame Wins

Where the Two Frames Disagree (in Practice)

Numeric Illustration: Bayes Risk vs Frequentist Risk Collapse

Common Confusions

Diagnostic Quizzes

References

Related Topics