Decision Theory
Expected Utility Theory
The axiomatic foundation of rational choice under uncertainty. Von Neumann-Morgenstern utility, the independence axiom, risk aversion from concavity, and where the theory breaks (Allais paradox, prospect theory).
Why This Matters
Every ML system makes decisions under uncertainty. A classifier chooses a label. A recommender selects an item. A trading bot decides how much to buy. The question is: what objective should the system optimize?
Expected utility theory provides the axiomatic answer. If your preferences over uncertain outcomes satisfy four reasonable axioms (completeness, transitivity, continuity, independence), then there exists a utility function such that you prefer lottery to lottery if and only if the expected utility of exceeds that of :
This is the Von Neumann-Morgenstern (VNM) theorem. It tells you that rational decision-making under uncertainty is equivalent to maximizing expected utility for some utility function. The shape of encodes risk preferences: concave means risk aversion, convex means risk seeking, linear means risk neutrality.
Understanding expected utility theory is necessary because:
- Cross-entropy loss is equivalent to maximizing expected log-utility of the predicted probability
- Risk-sensitive reinforcement learning modifies the RL objective using non-linear utility functions
- The Kelly criterion is the special case of expected log-utility
- Calibration: a well-calibrated model maximizes expected utility for every concave utility function simultaneously
- Loss function design: the choice of loss function implicitly defines a utility function over prediction errors
Mental Model
You can receive either $50 for certain, or a 50/50 gamble between $0 and $100. Both options have the same expected monetary value ($50). Which do you prefer?
If you prefer the sure $50, you are risk-averse. Your utility function is concave: , which is Jensen's inequality for concave functions.
If you are indifferent, you are risk-neutral. Your utility function is linear: .
If you prefer the gamble, you are risk-seeking. Your utility function is convex.
Expected utility theory says: under the VNM axioms, there is always some utility function that represents your preferences, and you act as if you are maximizing its expectation. The theory does not tell you which utility function to use. It tells you that rational preferences can always be described by expected utility maximization.
Core Definitions
Lottery
A lottery (or prospect) is a probability distribution over outcomes. A simple lottery over outcomes with probabilities is written:
A compound lottery is a lottery over lotteries. The reduction axiom (implied by the VNM axioms) says that compound lotteries can be reduced to simple lotteries using the law of total probability.
A degenerate lottery gives outcome with probability 1.
Preference Relation
A preference relation on the set of lotteries is a binary relation where means "lottery is at least as good as lottery ."
Derived relations:
- Strict preference: iff and not
- Indifference: iff and
Von Neumann-Morgenstern Axioms
The VNM axioms on preferences over lotteries are:
Axiom 1 (Completeness): For all lotteries : either or (or both).
Axiom 2 (Transitivity): If and , then .
Axiom 3 (Continuity): If , there exists such that .
Axiom 4 (Independence): For all lotteries and all :
Mixing both options with the same third lottery does not change the preference ordering.
Expected Utility Representation
A preference relation has an expected utility representation if there exists a function (the utility function) such that for all lotteries :
where denotes expectation under lottery . For a simple lottery :
The utility function is unique up to positive affine transformations: if represents , then so does for any and .
Risk Aversion and Certainty Equivalent
An agent is risk-averse if for every non-degenerate lottery with expected value :
By Jensen's inequality, this holds if and only if is strictly concave.
The certainty equivalent of lottery is the sure amount that the agent considers equally desirable:
For a risk-averse agent: . The difference is the risk premium: the amount the agent would pay to avoid the risk.
The Arrow-Pratt coefficient of absolute risk aversion is:
Higher means greater risk aversion at wealth .
Main Theorems
Von Neumann-Morgenstern Expected Utility Theorem
Statement
Let be a preference relation on the set of lotteries over a finite set of outcomes . If satisfies completeness, transitivity, continuity, and independence, then there exists a utility function such that:
Moreover, is unique up to positive affine transformation: if also represents , then for some and .
Intuition
The theorem says that any "reasonable" set of preferences (satisfying the four axioms) can be described as maximizing expected utility. You do not need to assume that people maximize expected utility. If they satisfy the axioms, they behave as if they do, whether or not they think in terms of utility functions.
The key axiom is independence. It says: if you prefer apples to oranges, then you also prefer "50% chance of apples, 50% chance of cake" to "50% chance of oranges, 50% chance of cake." Mixing in the same irrelevant alternative (cake) should not reverse your preference between apples and oranges. This axiom is what makes the representation linear in probabilities (expected utility rather than some nonlinear functional).
Proof Sketch
Construction of : Let be the best outcome and the worst (existence guaranteed by completeness and transitivity on a finite set). For each outcome , define as the unique satisfying:
This exists by continuity and is unique by monotonicity (which follows from the other axioms). Set and .
Verification: For any lottery :
by independence (substituting each with its equivalent lottery over and ). This reduces to:
So is indifferent to a binary lottery on and with weight . Since preferences over binary lotteries on and are determined by the weight on (by monotonicity), iff .
Uniqueness: If also represents , then , which is a positive affine transformation.
Why It Matters
The VNM theorem is the theoretical foundation for rational decision-making under uncertainty. It justifies:
- Loss function design: choosing a loss function for ML is equivalent to choosing a utility function over prediction errors. Cross-entropy loss corresponds to log utility of the predicted probability.
- Risk-sensitive optimization: modifying the RL reward function with a concave utility captures risk aversion without ad hoc modifications.
- Calibration: a well-calibrated predictor maximizes expected utility for every concave , making calibration the "universal" desirable property.
- Mechanism design: auction theory and incentive-compatible mechanism design build on expected utility preferences.
The uniqueness up to affine transformation means: the zero point and scale of utility are arbitrary. Only differences in expected utility matter, not absolute values.
Failure Mode
The independence axiom is the most controversial assumption, and real human behavior routinely violates it. The Allais paradox is the classic counterexample (described below). When independence fails, preferences cannot be represented by expected utility, and alternative theories (prospect theory, rank-dependent utility, regret theory) are needed.
Continuity can also fail for extreme outcomes. Some people have lexicographic preferences (e.g., "no probability of death is acceptable") that violate continuity.
Risk Aversion Equivalence
Statement
For an expected utility maximizer with utility , the following are equivalent:
- The agent is risk-averse: for all non-degenerate lotteries
- is concave
- For any non-degenerate random variable :
The risk premium for a small gamble with variance around wealth is approximately:
where is the Arrow-Pratt coefficient of absolute risk aversion.
Intuition
A concave utility function values gains less than it penalizes equivalent losses. If is concave, moving from $50 to $60 adds less utility than moving from $50 to $40 removes. So a symmetric gamble around $50 has negative expected utility change. The agent would rather keep the sure $50.
The Arrow-Pratt coefficient quantifies the curvature: more curvature means more risk aversion. The risk premium grows with both the curvature and the variance of the gamble.
Proof Sketch
: By definition, risk aversion means with . Since is increasing: , giving .
: This is Jensen's inequality. concave for all random variables .
Risk premium approximation: Taylor expand around : .
Setting equal: .
Why It Matters
This theorem connects the abstract concept of risk aversion to the concrete shape of the utility function. In ML:
- Cross-entropy loss corresponds to (log-utility of the predicted probability). This is concave, making the agent risk-averse over probability estimates: it penalizes overconfident wrong predictions more than it rewards confident correct ones.
- Squared error loss corresponds to (negative of squared error). This is also concave in the prediction, making the agent risk-averse.
- The choice of loss function in ML implicitly defines a risk attitude over prediction errors.
Failure Mode
The Arrow-Pratt approximation is valid only for small gambles. For large gambles, the full utility function matters, and the local curvature does not summarize risk attitudes. Constant absolute risk aversion (CARA) and constant relative risk aversion (CRRA) are the two families where the approximation extends globally.
The Independence Axiom and Its Failures
The independence axiom is the most mathematically powerful and the most empirically fragile of the four VNM axioms. It states: mixing two lotteries with a common third lottery preserves the preference order.
Formally: implies for all and .
This axiom is what makes the utility representation linear in probabilities. Without it, preferences might be described by nonlinear functionals of the probability distribution, such as rank-dependent utility or prospect theory.
The Allais Paradox
Maurice Allais (1953) constructed a pair of choices that most people answer inconsistently with expected utility:
Choice 1: (A) $1 million for certain, or (B) 89% chance of $1 million, 10% chance of $5 million, 1% chance of nothing.
Choice 2: (C) 11% chance of $1 million, 89% chance of nothing, or (D) 10% chance of $5 million, 90% chance of nothing.
Most people choose A over B (preferring certainty) and D over C (preferring the higher expected value). But this is inconsistent with expected utility.
To see why: choice 1 is choice 2 mixed with an 89% chance of $1 million. Specifically, A = 0.89 \cdot [\1M] + 0.11 \cdot [$1M]B = 0.89 \cdot [$1M] + 0.11 \cdot [\mathrm]C = 0.89 \cdot [$0] + 0.11 \cdot [$1M]D = 0.89 \cdot [$0] + 0.11 \cdot [\mathrm]A \succ BC \succ D (replacing the common 89% component with \0 instead of $1M should not reverse the preference). But most people choose and .
This violation is driven by the "certainty effect": people overweight the difference between certainty and near-certainty relative to changes in probability away from certainty.
Prospect Theory: The Behavioral Alternative
Kahneman and Tversky (1979) proposed prospect theory as a descriptive alternative to expected utility. Key departures:
- Reference dependence: utility is defined over gains and losses relative to a reference point, not over total wealth
- Loss aversion: losses hurt roughly 2x more than equivalent gains feel good. The value function is steeper for than for
- Probability weighting: people overweight small probabilities and underweight large ones. Instead of , the functional is where is a nonlinear probability weighting function
- Diminishing sensitivity: the value function is concave for gains and convex for losses (risk-averse for gains, risk-seeking for losses)
Prospect theory explains the Allais paradox, the equity premium puzzle, and many other systematic deviations from expected utility. It does not have a clean axiomatic foundation comparable to VNM, which limits its theoretical power.
Connections to ML
Cross-Entropy as Log-Utility
The cross-entropy loss for classification with predicted probability and true label is:
This is equivalent to maximizing expected log-utility of the predicted probability: . The log function is concave, making the classifier risk-averse over probability estimates. It penalizes confident wrong predictions ( when ) much more harshly than it rewards confident correct ones ( when ).
Proper Scoring Rules
A scoring rule is proper if the expected score is maximized when the predicted distribution equals the true distribution. The log-scoring rule (cross-entropy) is proper. Expected utility theory provides the framework: a scoring rule is proper if and only if it corresponds to a concave expected utility functional over the space of predicted distributions.
Risk-Sensitive RL
Standard RL maximizes . Risk-sensitive RL maximizes for a non-linear utility . Common choices:
- CVaR (conditional value at risk): focuses on the worst-case quantile
- Exponential utility: for risk parameter
- Mean-variance:
Each choice encodes a different risk attitude and leads to different optimal policies.
Canonical Examples
Risk premium for a square-root utility agent
Let and consider a gamble: 50% chance of $100, 50% chance of $0.
Expected monetary value: .
Expected utility: .
Certainty equivalent: , so .
Risk premium: .
The agent would accept $25 for certain rather than take the gamble worth $50 in expectation. The risk premium is 50% of the expected value, which is large because is strongly concave.
Arrow-Pratt coefficient: . At wealth : . The approximation gives , which underestimates the true premium of 25 because the gamble is large (the approximation is valid only for small gambles).
Log utility and the Kelly criterion
With , a gamble that multiplies wealth by has expected utility:
The first term is constant, so maximizing expected utility reduces to maximizing , which is exactly the Kelly criterion. Log utility is the unique utility function (up to affine transformation) that leads to myopic optimality: the optimal single-period bet is also the optimal bet for any multi-period horizon.
Common Confusions
Utility is not the same as happiness or money
Utility is a mathematical representation of preferences, not a measure of psychological satisfaction. Saying does not mean the agent "feels" units of happiness from dollars. It means the agent's preferences over lotteries are consistent with expected maximization. The utility function is determined by choices, not by introspection.
Risk aversion is not the same as irrationality
A risk-averse agent who rejects a favorable gamble is not being irrational. Under expected utility theory, risk aversion is perfectly rational: the agent has a concave utility function and is maximizing expected utility. The preference for certainty over equivalent-EV gambles is a consequence of diminishing marginal utility, not of computational error.
Expected utility theory is normative, not descriptive
The VNM theorem says what a rational agent should do, not what people actually do. People systematically violate the independence axiom (Allais paradox), exhibit loss aversion (prospect theory), and overweight small probabilities. Expected utility is a prescriptive benchmark, not a descriptive model of human behavior.
The utility function is not unique
Utility is unique only up to positive affine transformations. and represent exactly the same preferences. You cannot compare utility values across individuals, and statements like "person A has higher utility than person B" are meaningless within VNM theory.
Exercises
Problem
An agent has utility function and current wealth . A gamble offers a 60% chance of gaining $500 and a 40% chance of losing $300. Should the agent accept? Compute the certainty equivalent and risk premium.
Problem
Show that a risk-neutral agent ( for ) has for every lottery , and that the risk premium is always zero.
Problem
Prove that the Allais paradox (preferring A over B and D over C as described above) is inconsistent with any expected utility function. Show the explicit contradiction.
Problem
An ML model outputs predicted probabilities for binary classification. The log-loss (cross-entropy) is . Show that minimizing expected log-loss yields , and explain why this is equivalent to maximizing expected log-utility of the predicted probability.
References
Canonical:
- von Neumann & Morgenstern, Theory of Games and Economic Behavior (1944), Chapter 3
- Mas-Colell, Whinston, & Green, Microeconomic Theory (1995), Chapter 6
- Kreps, Notes on the Theory of Choice (1988), Chapters 3-5
Current:
- Gilboa, Theory of Decision under Uncertainty (2009), Chapters 5-8
- Kahneman & Tversky, "Prospect Theory: An Analysis of Decision under Risk," Econometrica (1979)
- Murphy, Probabilistic Machine Learning: An Introduction (2022), Chapter 5 (decision theory)
Next Topics
Building on expected utility:
- Kelly criterion: the special case of expected log-utility for repeated bets
- Game theory: expected utility as the foundation for strategic decision-making under uncertainty
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Convex Optimization BasicsLayer 1
- Differentiation in RnLayer 0A
- Matrix Operations and PropertiesLayer 0A
Builds on This
- Prospect TheoryLayer 3