Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Decision Theory

Expected Utility Theory

The axiomatic foundation of rational choice under uncertainty. Von Neumann-Morgenstern utility, the independence axiom, risk aversion from concavity, and where the theory breaks (Allais paradox, prospect theory).

CoreTier 2Stable~40 min

Why This Matters

Every ML system makes decisions under uncertainty. A classifier chooses a label. A recommender selects an item. A trading bot decides how much to buy. The question is: what objective should the system optimize?

Expected utility theory provides the axiomatic answer. If your preferences over uncertain outcomes satisfy four reasonable axioms (completeness, transitivity, continuity, independence), then there exists a utility function uu such that you prefer lottery AA to lottery BB if and only if the expected utility of AA exceeds that of BB:

AB    E[u(A)]E[u(B)]A \succsim B \iff \mathbb{E}[u(A)] \geq \mathbb{E}[u(B)]

This is the Von Neumann-Morgenstern (VNM) theorem. It tells you that rational decision-making under uncertainty is equivalent to maximizing expected utility for some utility function. The shape of uu encodes risk preferences: concave uu means risk aversion, convex uu means risk seeking, linear uu means risk neutrality.

Understanding expected utility theory is necessary because:

  • Cross-entropy loss is equivalent to maximizing expected log-utility of the predicted probability
  • Risk-sensitive reinforcement learning modifies the RL objective using non-linear utility functions
  • The Kelly criterion is the special case of expected log-utility
  • Calibration: a well-calibrated model maximizes expected utility for every concave utility function simultaneously
  • Loss function design: the choice of loss function implicitly defines a utility function over prediction errors

Mental Model

You can receive either $50 for certain, or a 50/50 gamble between $0 and $100. Both options have the same expected monetary value ($50). Which do you prefer?

If you prefer the sure $50, you are risk-averse. Your utility function is concave: u(50)>0.5u(0)+0.5u(100)u(50) > 0.5 \cdot u(0) + 0.5 \cdot u(100), which is Jensen's inequality for concave functions.

If you are indifferent, you are risk-neutral. Your utility function is linear: u(x)=ax+bu(x) = ax + b.

If you prefer the gamble, you are risk-seeking. Your utility function is convex.

Expected utility theory says: under the VNM axioms, there is always some utility function that represents your preferences, and you act as if you are maximizing its expectation. The theory does not tell you which utility function to use. It tells you that rational preferences can always be described by expected utility maximization.

Core Definitions

Definition

Lottery

A lottery (or prospect) is a probability distribution over outcomes. A simple lottery over outcomes x1,,xnx_1, \ldots, x_n with probabilities p1,,pnp_1, \ldots, p_n is written:

L=(x1,p1;x2,p2;;xn,pn)L = (x_1, p_1; \, x_2, p_2; \, \ldots; \, x_n, p_n)

A compound lottery is a lottery over lotteries. The reduction axiom (implied by the VNM axioms) says that compound lotteries can be reduced to simple lotteries using the law of total probability.

A degenerate lottery δx\delta_x gives outcome xx with probability 1.

Definition

Preference Relation

A preference relation \succsim on the set of lotteries L\mathcal{L} is a binary relation where ABA \succsim B means "lottery AA is at least as good as lottery BB."

Derived relations:

  • Strict preference: ABA \succ B iff ABA \succsim B and not BAB \succsim A
  • Indifference: ABA \sim B iff ABA \succsim B and BAB \succsim A
Definition

Von Neumann-Morgenstern Axioms

The VNM axioms on preferences \succsim over lotteries are:

Axiom 1 (Completeness): For all lotteries A,BA, B: either ABA \succsim B or BAB \succsim A (or both).

Axiom 2 (Transitivity): If ABA \succsim B and BCB \succsim C, then ACA \succsim C.

Axiom 3 (Continuity): If ABCA \succ B \succ C, there exists λ(0,1)\lambda \in (0, 1) such that BλA+(1λ)CB \sim \lambda A + (1-\lambda) C.

Axiom 4 (Independence): For all lotteries A,B,CA, B, C and all λ(0,1)\lambda \in (0, 1):

AB    λA+(1λ)CλB+(1λ)CA \succsim B \iff \lambda A + (1-\lambda)C \succsim \lambda B + (1-\lambda)C

Mixing both options with the same third lottery CC does not change the preference ordering.

Definition

Expected Utility Representation

A preference relation \succsim has an expected utility representation if there exists a function u:XRu: \mathcal{X} \to \mathbb{R} (the utility function) such that for all lotteries A,BA, B:

AB    EA[u(X)]EB[u(X)]A \succsim B \iff \mathbb{E}_A[u(X)] \geq \mathbb{E}_B[u(X)]

where EA\mathbb{E}_A denotes expectation under lottery AA. For a simple lottery L=(x1,p1;;xn,pn)L = (x_1, p_1; \ldots; x_n, p_n):

U(L)=i=1npiu(xi)U(L) = \sum_{i=1}^n p_i \, u(x_i)

The utility function uu is unique up to positive affine transformations: if uu represents \succsim, then so does v(x)=au(x)+bv(x) = au(x) + b for any a>0a > 0 and bRb \in \mathbb{R}.

Definition

Risk Aversion and Certainty Equivalent

An agent is risk-averse if for every non-degenerate lottery LL with expected value E[L]\mathbb{E}[L]:

u(E[L])>E[u(L)]u(\mathbb{E}[L]) > \mathbb{E}[u(L)]

By Jensen's inequality, this holds if and only if uu is strictly concave.

The certainty equivalent CE(L)CE(L) of lottery LL is the sure amount that the agent considers equally desirable:

u(CE(L))=E[u(L)]u(CE(L)) = \mathbb{E}[u(L)]

For a risk-averse agent: CE(L)<E[L]CE(L) < \mathbb{E}[L]. The difference π(L)=E[L]CE(L)0\pi(L) = \mathbb{E}[L] - CE(L) \geq 0 is the risk premium: the amount the agent would pay to avoid the risk.

The Arrow-Pratt coefficient of absolute risk aversion is:

r(x)=u(x)u(x)r(x) = -\frac{u''(x)}{u'(x)}

Higher r(x)r(x) means greater risk aversion at wealth xx.

Main Theorems

Theorem

Von Neumann-Morgenstern Expected Utility Theorem

Statement

Let \succsim be a preference relation on the set of lotteries over a finite set of outcomes X\mathcal{X}. If \succsim satisfies completeness, transitivity, continuity, and independence, then there exists a utility function u:XRu: \mathcal{X} \to \mathbb{R} such that:

AB    xpA(x)u(x)xpB(x)u(x)A \succsim B \iff \sum_{x} p_A(x) \, u(x) \geq \sum_{x} p_B(x) \, u(x)

Moreover, uu is unique up to positive affine transformation: if vv also represents \succsim, then v(x)=au(x)+bv(x) = au(x) + b for some a>0a > 0 and bRb \in \mathbb{R}.

Intuition

The theorem says that any "reasonable" set of preferences (satisfying the four axioms) can be described as maximizing expected utility. You do not need to assume that people maximize expected utility. If they satisfy the axioms, they behave as if they do, whether or not they think in terms of utility functions.

The key axiom is independence. It says: if you prefer apples to oranges, then you also prefer "50% chance of apples, 50% chance of cake" to "50% chance of oranges, 50% chance of cake." Mixing in the same irrelevant alternative (cake) should not reverse your preference between apples and oranges. This axiom is what makes the representation linear in probabilities (expected utility rather than some nonlinear functional).

Proof Sketch

Construction of uu: Let AA^* be the best outcome and AA_* the worst (existence guaranteed by completeness and transitivity on a finite set). For each outcome xx, define u(x)u(x) as the unique λ[0,1]\lambda \in [0, 1] satisfying:

xλA+(1λ)Ax \sim \lambda A^* + (1 - \lambda) A_*

This λ\lambda exists by continuity and is unique by monotonicity (which follows from the other axioms). Set u(A)=1u(A^*) = 1 and u(A)=0u(A_*) = 0.

Verification: For any lottery L=(x1,p1;;xn,pn)L = (x_1, p_1; \ldots; x_n, p_n):

Lipi[u(xi)A+(1u(xi))A]L \sim \sum_i p_i [\,u(x_i) A^* + (1 - u(x_i)) A_*\,]

by independence (substituting each xix_i with its equivalent lottery over AA^* and AA_*). This reduces to:

L[ipiu(xi)]A+[1ipiu(xi)]AL \sim [\sum_i p_i u(x_i)] A^* + [1 - \sum_i p_i u(x_i)] A_*

So LL is indifferent to a binary lottery on AA^* and AA_* with weight U(L)=ipiu(xi)U(L) = \sum_i p_i u(x_i). Since preferences over binary lotteries on AA^* and AA_* are determined by the weight on AA^* (by monotonicity), ABA \succsim B iff U(A)U(B)U(A) \geq U(B).

Uniqueness: If vv also represents \succsim, then v(x)=v(A)+[v(A)v(A)]u(x)v(x) = v(A_*) + [v(A^*) - v(A_*)] \cdot u(x), which is a positive affine transformation.

Why It Matters

The VNM theorem is the theoretical foundation for rational decision-making under uncertainty. It justifies:

  • Loss function design: choosing a loss function for ML is equivalent to choosing a utility function over prediction errors. Cross-entropy loss corresponds to log utility of the predicted probability.
  • Risk-sensitive optimization: modifying the RL reward function with a concave utility captures risk aversion without ad hoc modifications.
  • Calibration: a well-calibrated predictor maximizes expected utility for every concave uu, making calibration the "universal" desirable property.
  • Mechanism design: auction theory and incentive-compatible mechanism design build on expected utility preferences.

The uniqueness up to affine transformation means: the zero point and scale of utility are arbitrary. Only differences in expected utility matter, not absolute values.

Failure Mode

The independence axiom is the most controversial assumption, and real human behavior routinely violates it. The Allais paradox is the classic counterexample (described below). When independence fails, preferences cannot be represented by expected utility, and alternative theories (prospect theory, rank-dependent utility, regret theory) are needed.

Continuity can also fail for extreme outcomes. Some people have lexicographic preferences (e.g., "no probability of death is acceptable") that violate continuity.

Theorem

Risk Aversion Equivalence

Statement

For an expected utility maximizer with utility uu, the following are equivalent:

  1. The agent is risk-averse: CE(L)E[L]CE(L) \leq \mathbb{E}[L] for all non-degenerate lotteries LL
  2. uu is concave
  3. For any non-degenerate random variable XX: u(E[X])E[u(X)]u(\mathbb{E}[X]) \geq \mathbb{E}[u(X)]

The risk premium for a small gamble with variance σ2\sigma^2 around wealth ww is approximately:

π12r(w)σ2\pi \approx \frac{1}{2} r(w) \sigma^2

where r(w)=u(w)/u(w)r(w) = -u''(w)/u'(w) is the Arrow-Pratt coefficient of absolute risk aversion.

Intuition

A concave utility function values gains less than it penalizes equivalent losses. If uu is concave, moving from $50 to $60 adds less utility than moving from $50 to $40 removes. So a symmetric gamble around $50 has negative expected utility change. The agent would rather keep the sure $50.

The Arrow-Pratt coefficient quantifies the curvature: more curvature means more risk aversion. The risk premium grows with both the curvature r(w)r(w) and the variance σ2\sigma^2 of the gamble.

Proof Sketch

(1)(3)(1) \Leftrightarrow (3): By definition, risk aversion means u(CE)=E[u(X)]u(CE) = \mathbb{E}[u(X)] with CEE[X]CE \leq \mathbb{E}[X]. Since uu is increasing: u(CE)u(E[X])u(CE) \leq u(\mathbb{E}[X]), giving E[u(X)]u(E[X])\mathbb{E}[u(X)] \leq u(\mathbb{E}[X]).

(2)(3)(2) \Leftrightarrow (3): This is Jensen's inequality. uu concave \Leftrightarrow u(E[X])E[u(X)]u(\mathbb{E}[X]) \geq \mathbb{E}[u(X)] for all random variables XX.

Risk premium approximation: Taylor expand uu around ww: u(w+ϵ)u(w)+u(w)ϵ+12u(w)ϵ2u(w + \epsilon) \approx u(w) + u'(w)\epsilon + \frac{1}{2}u''(w)\epsilon^2.

E[u(w+ϵ)]u(w)+12u(w)σ2\mathbb{E}[u(w + \epsilon)] \approx u(w) + \frac{1}{2}u''(w)\sigma^2

u(wπ)u(w)u(w)πu(w - \pi) \approx u(w) - u'(w)\pi

Setting equal: πu(w)2u(w)σ2=12r(w)σ2\pi \approx -\frac{u''(w)}{2u'(w)}\sigma^2 = \frac{1}{2}r(w)\sigma^2.

Why It Matters

This theorem connects the abstract concept of risk aversion to the concrete shape of the utility function. In ML:

  • Cross-entropy loss corresponds to u(p)=logpu(p) = \log p (log-utility of the predicted probability). This is concave, making the agent risk-averse over probability estimates: it penalizes overconfident wrong predictions more than it rewards confident correct ones.
  • Squared error loss corresponds to u(e)=e2u(e) = -e^2 (negative of squared error). This is also concave in the prediction, making the agent risk-averse.
  • The choice of loss function in ML implicitly defines a risk attitude over prediction errors.

Failure Mode

The Arrow-Pratt approximation is valid only for small gambles. For large gambles, the full utility function matters, and the local curvature r(w)r(w) does not summarize risk attitudes. Constant absolute risk aversion (CARA) u(x)=erxu(x) = -e^{-rx} and constant relative risk aversion (CRRA) u(x)=x1γ/(1γ)u(x) = x^{1-\gamma}/(1-\gamma) are the two families where the approximation extends globally.

The Independence Axiom and Its Failures

The independence axiom is the most mathematically powerful and the most empirically fragile of the four VNM axioms. It states: mixing two lotteries with a common third lottery preserves the preference order.

Formally: ABA \succsim B implies λA+(1λ)CλB+(1λ)C\lambda A + (1-\lambda)C \succsim \lambda B + (1-\lambda)C for all CC and λ(0,1)\lambda \in (0,1).

This axiom is what makes the utility representation linear in probabilities. Without it, preferences might be described by nonlinear functionals of the probability distribution, such as rank-dependent utility or prospect theory.

The Allais Paradox

Maurice Allais (1953) constructed a pair of choices that most people answer inconsistently with expected utility:

Choice 1: (A) $1 million for certain, or (B) 89% chance of $1 million, 10% chance of $5 million, 1% chance of nothing.

Choice 2: (C) 11% chance of $1 million, 89% chance of nothing, or (D) 10% chance of $5 million, 90% chance of nothing.

Most people choose A over B (preferring certainty) and D over C (preferring the higher expected value). But this is inconsistent with expected utility.

To see why: choice 1 is choice 2 mixed with an 89% chance of $1 million. Specifically, A = 0.89 \cdot [\1M] + 0.11 \cdot [$1M]andandB = 0.89 \cdot [$1M] + 0.11 \cdot [\mathrm],while, while C = 0.89 \cdot [$0] + 0.11 \cdot [$1M]andandD = 0.89 \cdot [$0] + 0.11 \cdot [\mathrm].Byindependence,. By independence, A \succ Bshouldimplyshould implyC \succ D (replacing the common 89% component with \0 instead of $1M should not reverse the preference). But most people choose ABA \succ B and DCD \succ C.

This violation is driven by the "certainty effect": people overweight the difference between certainty and near-certainty relative to changes in probability away from certainty.

Prospect Theory: The Behavioral Alternative

Kahneman and Tversky (1979) proposed prospect theory as a descriptive alternative to expected utility. Key departures:

  • Reference dependence: utility is defined over gains and losses relative to a reference point, not over total wealth
  • Loss aversion: losses hurt roughly 2x more than equivalent gains feel good. The value function v(x)v(x) is steeper for x<0x < 0 than for x>0x > 0
  • Probability weighting: people overweight small probabilities and underweight large ones. Instead of piu(xi)\sum p_i u(x_i), the functional is w(pi)v(xi)\sum w(p_i) v(x_i) where ww is a nonlinear probability weighting function
  • Diminishing sensitivity: the value function is concave for gains and convex for losses (risk-averse for gains, risk-seeking for losses)

Prospect theory explains the Allais paradox, the equity premium puzzle, and many other systematic deviations from expected utility. It does not have a clean axiomatic foundation comparable to VNM, which limits its theoretical power.

Connections to ML

Cross-Entropy as Log-Utility

The cross-entropy loss for classification with predicted probability p^\hat{p} and true label y=1y = 1 is:

L=logp^\mathcal{L} = -\log \hat{p}

This is equivalent to maximizing expected log-utility of the predicted probability: u(p^)=logp^u(\hat{p}) = \log \hat{p}. The log function is concave, making the classifier risk-averse over probability estimates. It penalizes confident wrong predictions (p^0\hat{p} \approx 0 when y=1y = 1) much more harshly than it rewards confident correct ones (p^1\hat{p} \approx 1 when y=1y = 1).

Proper Scoring Rules

A scoring rule S(p,y)S(p, y) is proper if the expected score is maximized when the predicted distribution pp equals the true distribution. The log-scoring rule (cross-entropy) is proper. Expected utility theory provides the framework: a scoring rule is proper if and only if it corresponds to a concave expected utility functional over the space of predicted distributions.

Risk-Sensitive RL

Standard RL maximizes E[trt]\mathbb{E}[\sum_t r_t]. Risk-sensitive RL maximizes U(trt)U(\sum_t r_t) for a non-linear utility UU. Common choices:

  • CVaR (conditional value at risk): focuses on the worst-case quantile
  • Exponential utility: U(x)=eαxU(x) = -e^{-\alpha x} for risk parameter α>0\alpha > 0
  • Mean-variance: U=E[R]λVar(R)U = \mathbb{E}[R] - \lambda \operatorname{Var}(R)

Each choice encodes a different risk attitude and leads to different optimal policies.

Canonical Examples

Example

Risk premium for a square-root utility agent

Let u(x)=xu(x) = \sqrt{x} and consider a gamble: 50% chance of $100, 50% chance of $0.

Expected monetary value: E[X]=50\mathbb{E}[X] = 50.

Expected utility: E[u(X)]=0.5100+0.50=5\mathbb{E}[u(X)] = 0.5 \sqrt{100} + 0.5 \sqrt{0} = 5.

Certainty equivalent: u(CE)=5u(CE) = 5, so CE=25CE = 25.

Risk premium: π=E[X]CE=5025=25\pi = \mathbb{E}[X] - CE = 50 - 25 = 25.

The agent would accept $25 for certain rather than take the gamble worth $50 in expectation. The risk premium is 50% of the expected value, which is large because x\sqrt{x} is strongly concave.

Arrow-Pratt coefficient: r(x)=u(x)/u(x)=1/(2x)r(x) = -u''(x)/u'(x) = 1/(2x). At wealth w=50w = 50: r(50)=0.01r(50) = 0.01. The approximation gives π0.5×0.01×2500=12.5\pi \approx 0.5 \times 0.01 \times 2500 = 12.5, which underestimates the true premium of 25 because the gamble is large (the approximation is valid only for small gambles).

Example

Log utility and the Kelly criterion

With u(x)=logxu(x) = \log x, a gamble that multiplies wealth by (1+fR)(1 + fR) has expected utility:

E[u(W(1+fR))]=logW+E[log(1+fR)]\mathbb{E}[u(W(1 + fR))] = \log W + \mathbb{E}[\log(1 + fR)]

The first term is constant, so maximizing expected utility reduces to maximizing g(f)=E[log(1+fR)]g(f) = \mathbb{E}[\log(1 + fR)], which is exactly the Kelly criterion. Log utility is the unique utility function (up to affine transformation) that leads to myopic optimality: the optimal single-period bet is also the optimal bet for any multi-period horizon.

Common Confusions

Watch Out

Utility is not the same as happiness or money

Utility is a mathematical representation of preferences, not a measure of psychological satisfaction. Saying u(x)=xu(x) = \sqrt{x} does not mean the agent "feels" x\sqrt{x} units of happiness from xx dollars. It means the agent's preferences over lotteries are consistent with expected x\sqrt{x} maximization. The utility function is determined by choices, not by introspection.

Watch Out

Risk aversion is not the same as irrationality

A risk-averse agent who rejects a favorable gamble is not being irrational. Under expected utility theory, risk aversion is perfectly rational: the agent has a concave utility function and is maximizing expected utility. The preference for certainty over equivalent-EV gambles is a consequence of diminishing marginal utility, not of computational error.

Watch Out

Expected utility theory is normative, not descriptive

The VNM theorem says what a rational agent should do, not what people actually do. People systematically violate the independence axiom (Allais paradox), exhibit loss aversion (prospect theory), and overweight small probabilities. Expected utility is a prescriptive benchmark, not a descriptive model of human behavior.

Watch Out

The utility function is not unique

Utility is unique only up to positive affine transformations. u(x)=logxu(x) = \log x and v(x)=2logx+7v(x) = 2\log x + 7 represent exactly the same preferences. You cannot compare utility values across individuals, and statements like "person A has higher utility than person B" are meaningless within VNM theory.

Exercises

ExerciseCore

Problem

An agent has utility function u(x)=logxu(x) = \log x and current wealth w=1000w = 1000. A gamble offers a 60% chance of gaining $500 and a 40% chance of losing $300. Should the agent accept? Compute the certainty equivalent and risk premium.

ExerciseCore

Problem

Show that a risk-neutral agent (u(x)=ax+bu(x) = ax + b for a>0a > 0) has CE(L)=E[L]CE(L) = \mathbb{E}[L] for every lottery LL, and that the risk premium is always zero.

ExerciseAdvanced

Problem

Prove that the Allais paradox (preferring A over B and D over C as described above) is inconsistent with any expected utility function. Show the explicit contradiction.

ExerciseAdvanced

Problem

An ML model outputs predicted probabilities p^\hat{p} for binary classification. The log-loss (cross-entropy) is =ylogp^(1y)log(1p^)\ell = -y\log\hat{p} - (1-y)\log(1-\hat{p}). Show that minimizing expected log-loss yields p^=P(Y=1X)\hat{p}^* = P(Y=1 \mid X), and explain why this is equivalent to maximizing expected log-utility of the predicted probability.

References

Canonical:

  • von Neumann & Morgenstern, Theory of Games and Economic Behavior (1944), Chapter 3
  • Mas-Colell, Whinston, & Green, Microeconomic Theory (1995), Chapter 6
  • Kreps, Notes on the Theory of Choice (1988), Chapters 3-5

Current:

  • Gilboa, Theory of Decision under Uncertainty (2009), Chapters 5-8
  • Kahneman & Tversky, "Prospect Theory: An Analysis of Decision under Risk," Econometrica (1979)
  • Murphy, Probabilistic Machine Learning: An Introduction (2022), Chapter 5 (decision theory)

Next Topics

Building on expected utility:

  • Kelly criterion: the special case of expected log-utility for repeated bets
  • Game theory: expected utility as the foundation for strategic decision-making under uncertainty

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics