Expected Utility Theory

Sneiderman, Robby

Decision Theory

Expected Utility Theory

The axiomatic foundation of rational choice under uncertainty. Von Neumann-Morgenstern utility, the independence axiom, risk aversion from concavity, and where the theory breaks (Allais paradox, prospect theory).

CoreTier 2StableReference~40 min

Prerequisites

Common Probability Distributions Convex Optimization Basics Bounded Rationality Decision Theory Foundations

Start 8-question practice · 1 available Prereq Map

Learning position

Read this page in the graph.

decision-theory | layer 2 | tier 2. This page has 4 direct prerequisites and 3 published dependents.

Open Atlas Prerequisites Leads to

What next

Kelly Criterion

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Every ML system makes decisions under uncertainty. A classifier chooses a label. A recommender selects an item. A trading bot decides how much to buy. The question is: what objective should the system optimize?

Expected utility theory provides the axiomatic answer. If your preferences over uncertain outcomes satisfy four reasonable axioms (completeness, transitivity, continuity, independence), then there exists a utility function $u$ such that you prefer lottery $A$ to lottery $B$ if and only if the expected utility of $A$ exceeds that of $B$ :

$A \succsim B \iff \mathbb{E}[u(A)] \geq \mathbb{E}[u(B)]$

This is the Von Neumann-Morgenstern (VNM) theorem. It tells you that rational decision-making under uncertainty is equivalent to maximizing expected utility for some utility function. The shape of $u$ encodes risk preferences: concave $u$ means risk aversion, convex $u$ means risk seeking, linear $u$ means risk neutrality.

Understanding expected utility theory is necessary because:

Cross-entropy loss is equivalent to maximizing expected log-utility of the predicted probability
Risk-sensitive reinforcement learning modifies the RL objective using non-linear utility functions
The Kelly criterion is the special case of expected log-utility
Calibration: calibration is necessary for reliable probabilities (among cases assigned probability $p$ , the empirical frequency is $p$ ), but it is not sufficient for expected-utility optimality. The full conditional distribution $P(Y \mid X)$ is what supports Bayes-optimal decisions for downstream utilities; calibration plus sharpness (Gneiting–Balabdaoui–Raftery 2007) is closer to that goal
Loss function design: the choice of loss function implicitly defines a utility function over prediction errors

Mental Model

You can receive either $50 for certain, or a 50/50 gamble between $0 and $100. Both options have the same expected monetary value ($50). Which do you prefer?

If you prefer the sure $50, you are risk-averse. Your utility function is concave: $u(50) > 0.5 \cdot u(0) + 0.5 \cdot u(100)$ , which is Jensen's inequality for concave functions.

If you are indifferent, you are risk-neutral. Your utility function is linear: $u(x) = ax + b$ .

If you prefer the gamble, you are risk-seeking. Your utility function is convex.

Expected utility theory says: under the VNM axioms, there is always some utility function that represents your preferences, and you act as if you are maximizing its expectation. The theory does not tell you which utility function to use. It tells you that rational preferences can always be described by expected utility maximization.

Core Definitions

Definition

Lottery

A lottery (or prospect) is a probability distribution over outcomes. A simple lottery over outcomes $x_1, \ldots, x_n$ with probabilities $p_1, \ldots, p_n$ is written:

$L = (x_1, p_1; \, x_2, p_2; \, \ldots; \, x_n, p_n)$

A compound lottery is a lottery over lotteries. The reduction axiom (implied by the VNM axioms) says that compound lotteries can be reduced to simple lotteries using the law of total probability.

A degenerate lottery $\delta_x$ gives outcome $x$ with probability 1.

Definition

Preference Relation $≿$

A preference relation $\succsim$ on the set of lotteries $\mathcal{L}$ is a binary relation where $A \succsim B$ means "lottery $A$ is at least as good as lottery $B$ ."

Derived relations:

Strict preference: $A \succ B$ iff $A \succsim B$ and not $B \succsim A$
Indifference: $A \sim B$ iff $A \succsim B$ and $B \succsim A$

Definition

Von Neumann-Morgenstern Axioms

The VNM axioms on preferences $\succsim$ over lotteries are:

Axiom 1 (Completeness): For all lotteries $A, B$ : either $A \succsim B$ or $B \succsim A$ (or both).

Axiom 2 (Transitivity): If $A \succsim B$ and $B \succsim C$ , then $A \succsim C$ .

Axiom 3 (Continuity): If $A \succ B \succ C$ , there exists $\lambda \in (0, 1)$ such that $B \sim \lambda A + (1-\lambda) C$ .

Axiom 4 (Independence): For all lotteries $A, B, C$ and all $\lambda \in (0, 1)$ :

$A \succsim B \iff \lambda A + (1-\lambda)C \succsim \lambda B + (1-\lambda)C$

Mixing both options with the same third lottery $C$ does not change the preference ordering.

Definition

Expected Utility Representation

A preference relation $\succsim$ has an expected utility representation if and only if there exists a function $u: \mathcal{X} \to \mathbb{R}$ (the utility function) such that for all lotteries $A, B$ :

$A \succsim B \iff \mathbb{E}_A[u(X)] \geq \mathbb{E}_B[u(X)]$

where $\mathbb{E}_A$ denotes expectation under lottery $A$ . For a simple lottery $L = (x_1, p_1; \ldots; x_n, p_n)$ :

$U(L) = \sum_{i=1}^n p_i \, u(x_i)$

The utility function $u$ is unique up to positive affine transformations: if $u$ represents $\succsim$ , then so does $v(x) = au(x) + b$ for any $a > 0$ and $b \in \mathbb{R}$ .

Definition

Risk Aversion and Certainty Equivalent

An agent is risk-averse if and only if for every non-degenerate lottery $L$ with expected value $\mathbb{E}[L]$ :

$u(\mathbb{E}[L]) > \mathbb{E}[u(L)]$

By Jensen's inequality, this holds if and only if $u$ is strictly concave.

The certainty equivalent $CE(L)$ of lottery $L$ is the sure amount that the agent considers equally desirable:

$u(CE(L)) = \mathbb{E}[u(L)]$

For a risk-averse agent: $CE(L) < \mathbb{E}[L]$ . The difference $\pi(L) = \mathbb{E}[L] - CE(L) \geq 0$ is the risk premium: the amount the agent would pay to avoid the risk.

The Arrow-Pratt coefficient of absolute risk aversion is:

$r(x) = -\frac{u''(x)}{u'(x)}$

Higher $r(x)$ means greater risk aversion at wealth $x$ .

Main Theorems

Theorem

Von Neumann-Morgenstern Expected Utility Theorem

Statement

Let $\succsim$ be a preference relation on the set of lotteries over a finite set of outcomes $\mathcal{X}$ . If $\succsim$ satisfies completeness, transitivity, continuity, and independence, then there exists a utility function $u: \mathcal{X} \to \mathbb{R}$ such that:

$A \succsim B \iff \sum_{x} p_A(x) \, u(x) \geq \sum_{x} p_B(x) \, u(x)$

Moreover, $u$ is unique up to positive affine transformation: if $v$ also represents $\succsim$ , then $v(x) = au(x) + b$ for some $a > 0$ and $b \in \mathbb{R}$ .

Intuition

The theorem says that any "reasonable" set of preferences (satisfying the four axioms) can be described as maximizing expected utility. You do not need to assume that people maximize expected utility. If they satisfy the axioms, they behave as if they do, whether or not they think in terms of utility functions.

The key axiom is independence. It says: if you prefer apples to oranges, then you also prefer "50% chance of apples, 50% chance of cake" to "50% chance of oranges, 50% chance of cake." Mixing in the same irrelevant alternative (cake) should not reverse your preference between apples and oranges. This axiom is what makes the representation linear in probabilities (expected utility rather than some nonlinear functional).

Proof Sketch

Construction of $u$ : Let $A^*$ be the best outcome and $A_*$ the worst (existence guaranteed by completeness and transitivity on a finite set). For each outcome $x$ , define $u(x)$ as the unique $\lambda \in [0, 1]$ satisfying:

$x \sim \lambda A^* + (1 - \lambda) A_*$

This $\lambda$ exists by continuity and is unique by monotonicity (which follows from the other axioms). Set $u(A^*) = 1$ and $u(A_*) = 0$ .

Verification: For any lottery $L = (x_1, p_1; \ldots; x_n, p_n)$ :

$L \sim \sum_i p_i [\,u(x_i) A^* + (1 - u(x_i)) A_*\,]$

by independence (substituting each $x_i$ with its equivalent lottery over $A^*$ and $A_*$ ). This reduces to:

$L \sim [\sum_i p_i u(x_i)] A^* + [1 - \sum_i p_i u(x_i)] A_*$

So $L$ is indifferent to a binary lottery on $A^*$ and $A_*$ with weight $U(L) = \sum_i p_i u(x_i)$ . Since preferences over binary lotteries on $A^*$ and $A_*$ are determined by the weight on $A^*$ (by monotonicity), $A \succsim B$ iff $U(A) \geq U(B)$ .

Uniqueness: If $v$ also represents $\succsim$ , then $v(x) = v(A_*) + [v(A^*) - v(A_*)] \cdot u(x)$ , which is a positive affine transformation.

Why It Matters

The VNM theorem is the theoretical foundation for rational decision-making under uncertainty. It justifies:

Loss function design: choosing a loss function for ML is equivalent to choosing a utility function over prediction errors. Cross-entropy loss corresponds to log utility of the predicted probability.
Risk-sensitive optimization: modifying the RL reward function with a concave utility captures risk aversion without ad hoc modifications.
Calibration and decision-making: calibration is a reliability property of probability forecasts, not a sufficient condition for expected-utility optimality. A constant base-rate predictor can be perfectly calibrated and still useless for individualized decisions. What supports Bayes-optimal decisions for arbitrary downstream utilities is the true conditional distribution $P(Y \mid X)$ ; calibration is a necessary check, not a substitute.
Mechanism design: auction theory and incentive-compatible mechanism design build on expected utility preferences.

The uniqueness up to affine transformation means: the zero point and scale of utility are arbitrary. Only differences in expected utility matter, not absolute values.

Failure Mode

The independence axiom is the most controversial assumption, and real human behavior routinely violates it. The Allais paradox is the classic counterexample (described below). When independence fails, preferences cannot be represented by expected utility, and alternative theories (prospect theory, rank-dependent utility, regret theory) are needed.

Continuity can also fail for extreme outcomes. Some people have lexicographic preferences (e.g., "no probability of death is acceptable") that violate continuity.

report a correction →

Theorem

Risk Aversion Equivalence

Statement

For an expected utility maximizer with utility $u$ , the following are equivalent:

The agent is risk-averse: $CE(L) \leq \mathbb{E}[L]$ for all non-degenerate lotteries $L$
$u$ is concave
For any non-degenerate random variable $X$ : $u(\mathbb{E}[X]) \geq \mathbb{E}[u(X)]$

The risk premium for a small gamble with variance $\sigma^2$ around wealth $w$ is approximately:

$\pi \approx \frac{1}{2} r(w) \sigma^2$

where $r(w) = -u''(w)/u'(w)$ is the Arrow-Pratt coefficient of absolute risk aversion.

Intuition

A concave utility function values gains less than it penalizes equivalent losses. If $u$ is concave, moving from $50 to $60 adds less utility than moving from $50 to $40 removes. So a symmetric gamble around $50 has negative expected utility change. The agent would rather keep the sure $50.

The Arrow-Pratt coefficient quantifies the curvature: more curvature means more risk aversion. The risk premium grows with both the curvature $r(w)$ and the variance $\sigma^2$ of the gamble.

Proof Sketch

$(1) \Leftrightarrow (3)$ : By definition, risk aversion means $u(CE) = \mathbb{E}[u(X)]$ with $CE \leq \mathbb{E}[X]$ . Since $u$ is increasing: $u(CE) \leq u(\mathbb{E}[X])$ , giving $\mathbb{E}[u(X)] \leq u(\mathbb{E}[X])$ .

$(2) \Leftrightarrow (3)$ : This is Jensen's inequality. $u$ concave $\Leftrightarrow$ $u(\mathbb{E}[X]) \geq \mathbb{E}[u(X)]$ for all random variables $X$ .

Risk premium approximation: Taylor expand $u$ around $w$ : $u(w + \epsilon) \approx u(w) + u'(w)\epsilon + \frac{1}{2}u''(w)\epsilon^2$ .

$\mathbb{E}[u(w + \epsilon)] \approx u(w) + \frac{1}{2}u''(w)\sigma^2$

$u(w - \pi) \approx u(w) - u'(w)\pi$

Setting equal: $\pi \approx -\frac{u''(w)}{2u'(w)}\sigma^2 = \frac{1}{2}r(w)\sigma^2$ .

Why It Matters

This theorem connects the abstract concept of risk aversion to the concrete shape of the utility function. In ML:

Cross-entropy / log loss is the negative log score, a strictly proper scoring rule (Gneiting–Raftery 2007): expected log score is uniquely maximized by reporting the true conditional distribution. The concavity of $\log p$ in $p$ encodes the properness; describing this as Arrow–Pratt risk aversion conflates scoring-rule geometry with VNM risk preferences over outcomes, which are distinct objects unless an explicit action–outcome utility model is specified.
Squared error loss as a Brier-style score is also strictly proper.
The choice of loss function in ML reflects which proper scoring rule (and implicit decision problem) the modeler has in mind, rather than a literal risk attitude over wealth.

Failure Mode

The Arrow-Pratt approximation is valid only for small gambles. For large gambles, the full utility function matters, and the local curvature $r(w)$ does not summarize risk attitudes. Constant absolute risk aversion (CARA) $u(x) = -e^{-rx}$ and constant relative risk aversion (CRRA) $u(x) = x^{1-\gamma}/(1-\gamma)$ are the two families where the approximation extends globally.

report a correction →

The Independence Axiom and Its Failures

The independence axiom is the most mathematically powerful and the most empirically fragile of the four VNM axioms. It states: mixing two lotteries with a common third lottery preserves the preference order.

Formally: $A \succsim B$ implies $\lambda A + (1-\lambda)C \succsim \lambda B + (1-\lambda)C$ for all $C$ and $\lambda \in (0,1)$ .

This axiom is what makes the utility representation linear in probabilities. Without it, preferences might be described by nonlinear functionals of the probability distribution, such as rank-dependent utility or prospect theory.

The Allais Paradox

Maurice Allais (1953) constructed a pair of choices that most people answer inconsistently with expected utility:

Choice 1: (A) $1 million for certain, or (B) 89% chance of $1 million, 10% chance of $5 million, 1% chance of nothing.

Choice 2: (C) 11% chance of $1 million, 89% chance of nothing, or (D) 10% chance of $5 million, 90% chance of nothing.

Most people choose A over B (preferring certainty) and D over C (preferring the higher expected value). But this is inconsistent with expected utility.

To see why: choice 1 is choice 2 mixed with an 89% chance of $1 million. Specifically, $A = 0.89 \cdot [1M] + 0.11 \cdot [1M]$ and $B = 0.89 \cdot [1M] + 0.11 \cdot [\mathrm{gamble}]$ , while $C = 0.89 \cdot [0] + 0.11 \cdot [1M]$ and $D = 0.89 \cdot [0] + 0.11 \cdot [\mathrm{gamble}]$ . By independence, $A \succ B$ should imply $C \succ D$ (replacing the common 89% component with $0 instead of $1M should not reverse the preference). But most people choose $A \succ B$ and $D \succ C$ .

This violation is driven by the "certainty effect": people overweight the difference between certainty and near-certainty relative to changes in probability away from certainty.

Prospect Theory: The Behavioral Alternative

Kahneman and Tversky (1979) proposed prospect theory as a descriptive alternative to expected utility. Key departures:

Reference dependence: utility is defined over gains and losses relative to a reference point, not over total wealth
Loss aversion: losses hurt roughly 2x more than equivalent gains feel good. The value function $v(x)$ is steeper for $x < 0$ than for $x > 0$
Probability weighting: people overweight small probabilities and underweight large ones. Instead of $\sum p_i u(x_i)$ , the functional is $\sum w(p_i) v(x_i)$ where $w$ is a nonlinear probability weighting function
Diminishing sensitivity: the value function is concave for gains and convex for losses (risk-averse for gains, risk-seeking for losses)

Prospect theory explains the Allais paradox, the equity premium puzzle, and many other systematic deviations from expected utility. It does not have a clean axiomatic foundation comparable to VNM, which limits its theoretical power.

Connections to ML

Cross-Entropy as a Strictly Proper Scoring Rule

The cross-entropy loss for classification with predicted probability $\hat{p}$ and true label $y = 1$ is:

$\mathcal{L} = -\log \hat{p}$

This is the negative log score. The expected log score $\mathbb{E}_{Y \sim p}[\log \hat{p}(Y)]$ is uniquely maximized by $\hat{p} = p$ , which is what makes log loss a strictly proper scoring rule (Gneiting–Raftery 2007). The concavity of $\log$ in $\hat p$ encodes properness and explains why confident wrong predictions are penalized severely. It is not the same object as Arrow–Pratt risk aversion over wealth, which requires an explicit action–outcome utility model; conflating the two is a common informal shortcut.

Proper Scoring Rules

A scoring rule $S(p, y)$ is proper if and only if the expected score is maximized when the predicted distribution $p$ equals the true distribution. The log-scoring rule (cross-entropy) is proper. Expected utility theory provides the framework: a scoring rule is proper if and only if it corresponds to a concave expected utility functional over the space of predicted distributions.

Risk-Sensitive RL

Standard RL maximizes $\mathbb{E}[\sum_t r_t]$ . Risk-sensitive RL maximizes $U(\sum_t r_t)$ for a non-linear utility $U$ . Common choices:

CVaR (conditional value at risk): focuses on the worst-case quantile
Exponential utility: $U(x) = -e^{-\alpha x}$ for risk parameter $\alpha > 0$
Mean-variance: $U = \mathbb{E}[R] - \lambda \operatorname{Var}(R)$

Each choice encodes a different risk attitude and leads to different optimal policies.

Canonical Examples

Example

Risk premium for a square-root utility agent

Let $u(x) = \sqrt{x}$ and consider a gamble: 50% chance of $100, 50% chance of $0.

Expected monetary value: $\mathbb{E}[X] = 50$ .

Expected utility: $\mathbb{E}[u(X)] = 0.5 \sqrt{100} + 0.5 \sqrt{0} = 5$ .

Certainty equivalent: $u(CE) = 5$ , so $CE = 25$ .

Risk premium: $\pi = \mathbb{E}[X] - CE = 50 - 25 = 25$ .

The agent would accept $25 for certain rather than take the gamble worth $50 in expectation. The risk premium is 50% of the expected value, which is large because $\sqrt{x}$ is strongly concave.

Arrow-Pratt coefficient: $r(x) = -u''(x)/u'(x) = 1/(2x)$ . At wealth $w = 50$ : $r(50) = 0.01$ . The approximation gives $\pi \approx 0.5 \times 0.01 \times 2500 = 12.5$ , which underestimates the true premium of 25 because the gamble is large (the approximation is valid only for small gambles).

Example

Log utility and the Kelly criterion

With $u(x) = \log x$ , a gamble that multiplies wealth by $(1 + fR)$ has expected utility:

$\mathbb{E}[u(W(1 + fR))] = \log W + \mathbb{E}[\log(1 + fR)]$

The first term is constant, so maximizing expected utility reduces to maximizing $g(f) = \mathbb{E}[\log(1 + fR)]$ , which is exactly the Kelly criterion. Log utility is the unique utility function (up to affine transformation) that leads to myopic optimality: the optimal single-period bet is also the optimal bet for any multi-period horizon.

Common Confusions

Watch Out

Utility is not the same as happiness or money

Utility is a mathematical representation of preferences, not a measure of psychological satisfaction. Saying $u(x) = \sqrt{x}$ does not mean the agent "feels" $\sqrt{x}$ units of happiness from $x$ dollars. It means the agent's preferences over lotteries are consistent with expected $\sqrt{x}$ maximization. The utility function is determined by choices, not by introspection.

Watch Out

Risk aversion is not the same as irrationality

A risk-averse agent who rejects a favorable gamble is not being irrational. Under expected utility theory, risk aversion is perfectly rational: the agent has a concave utility function and is maximizing expected utility. The preference for certainty over equivalent-EV gambles is a consequence of diminishing marginal utility, not of computational error.

Watch Out

Expected utility theory is normative, not descriptive

The VNM theorem says what a rational agent should do, not what people actually do. People systematically violate the independence axiom (Allais paradox), exhibit loss aversion (prospect theory), and overweight small probabilities. Expected utility is a prescriptive benchmark, not a descriptive model of human behavior.

Watch Out

The utility function is not unique

Utility is unique only up to positive affine transformations. $u(x) = \log x$ and $v(x) = 2\log x + 7$ represent exactly the same preferences. You cannot compare utility values across individuals, and statements like "person A has higher utility than person B" are meaningless within VNM theory.

Exercises

ExerciseCore

Problem

An agent has utility function $u(x) = \log x$ and current wealth $w = 1000$ . A gamble offers a 60% chance of gaining $500 and a 40% chance of losing $300. Should the agent accept? Compute the certainty equivalent and risk premium.

ExerciseCore

Problem

Show that a risk-neutral agent ( $u(x) = ax + b$ for $a > 0$ ) has $CE(L) = \mathbb{E}[L]$ for every lottery $L$ , and that the risk premium is always zero.

ExerciseAdvanced

Problem

Prove that the Allais paradox (preferring A over B and D over C as described above) is inconsistent with any expected utility function. Show the explicit contradiction.

ExerciseAdvanced

Problem

An ML model outputs predicted probabilities $\hat{p}$ for binary classification. The log-loss (cross-entropy) is $\ell = -y\log\hat{p} - (1-y)\log(1-\hat{p})$ . Show that minimizing expected log-loss yields $\hat{p}^* = P(Y=1 \mid X)$ , and explain why this is equivalent to maximizing expected log-utility of the predicted probability.

References

Canonical:

von Neumann & Morgenstern, Theory of Games and Economic Behavior (1944), Chapter 3
Mas-Colell, Whinston, & Green, Microeconomic Theory (1995), Chapter 6
Kreps, Notes on the Theory of Choice (1988), Chapters 3-5

Current:

Gilboa, Theory of Decision under Uncertainty (2009), Chapters 5-8
Kahneman & Tversky, "Prospect Theory: An Analysis of Decision under Risk," Econometrica (1979)
Murphy, Probabilistic Machine Learning: An Introduction (2022), Chapter 5 (decision theory)

Next Topics

Building on expected utility:

Kelly criterion: the special case of expected log-utility for repeated bets
Game theory: expected utility as the foundation for strategic decision-making under uncertainty

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Common Probability Distributionslayer 0A · tier 1
Convex Optimization Basicslayer 1 · tier 1
Bounded Rationalitylayer 2 · tier 1
Decision Theory Foundationslayer 2 · tier 2

Derived topics

3

Game Theory Foundationslayer 2 · tier 1
Kelly Criterionlayer 2 · tier 2
Prospect Theorylayer 3 · tier 2

Graph-backed continuations

Kelly Criterion Game Theory Foundations Prospect Theory