Bounded Rationality

Sneiderman, Robby

Decision Theory

Bounded Rationality

Real agents optimize under limited information, limited compute, and limited foresight. Simon's satisficing, heuristics, and the implications for search, planning, and agent design in ML.

CoreTier 1StableCore spine~45 min

Prerequisites

Decision Theory Foundations Convex Optimization Basics Convex Tinkering Editorial Principles

Start 8-question practice · 3 available Prereq Map

Learning position

Read this page in the graph.

decision-theory | layer 2 | tier 1. This page has 4 direct prerequisites and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Game Theory Foundations

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Satisficing vs optimizing: stopping at "good enough" vs exhaustive search

Classical decision theory assumes agents can enumerate all actions, compute all consequences, and select the optimal one. This assumption is false for every real system. A chess player cannot evaluate all $10^{120}$ possible games. A language model cannot score all possible continuations. A robot cannot plan over an infinite horizon. The computational cost of finding the optimum exceeds any real agent's budget.

Herbert Simon introduced bounded rationality in the 1950s to replace the fiction of perfect optimization with a realistic account of what agents actually do: search until they find something good enough, then stop. This is not a failure mode. It is the only feasible strategy when optimization is computationally intractable.

Every ML system that uses beam search, early stopping, pruning, or sampling with a temperature parameter is implementing a form of bounded rationality. Understanding the theory tells you when these shortcuts are justified, when they fail, and how to design better ones.

The Critique of Perfect Rationality

Classical rational choice theory (Savage, von Neumann-Morgenstern) assumes:

The agent knows all available alternatives.
The agent can compute the consequences of each alternative.
The agent has a complete, transitive preference ordering over consequences.
The agent selects the alternative that maximizes expected utility.

Simon's critique targets assumptions (1) and (2). In any nontrivial environment, the set of alternatives is too large to enumerate, and the consequences of each alternative are too expensive to compute. A chess player does not maximize over all strategies. A firm does not compute the profit-maximizing price to arbitrary precision. An LLM does not score all possible token sequences.

The question is not whether real agents are rational. The question is: what does rational behavior look like when computation itself is a scarce resource?

Satisficing

Definition

Satisficing

An agent satisfices if and only if it searches through alternatives and selects the first one that meets or exceeds an aspiration level $\tau$ . Formally, given a set of alternatives $\mathcal{A}$ , a utility function $u: \mathcal{A} \to \mathbb{R}$ , and a threshold $\tau$ , the agent selects:

$a^* = \text{first } a \in \mathcal{A} \text{ encountered such that } u(a) \geq \tau$

The agent does not compare $a^*$ to all other alternatives. It stops searching as soon as it finds one that is good enough.

Definition

Aspiration Level

The aspiration level $\tau$ is the threshold that separates satisfactory from unsatisfactory alternatives. Simon proposed that aspiration levels are not fixed. They adapt based on experience: if the agent easily finds alternatives above $\tau$ , it raises $\tau$ . If it repeatedly fails to find satisfactory alternatives, it lowers $\tau$ .

Definition

Search Cost

Let $c > 0$ be the cost of evaluating one additional alternative. The agent faces a tradeoff: searching more increases the expected quality of the selected alternative but incurs additional cost. The optimal stopping problem balances the marginal benefit of continued search against the marginal cost $c$ .

The Satisficing Threshold Theorem

Theorem

Optimal Satisficing Threshold

Statement

Let alternatives be drawn i.i.d. from a distribution with CDF $F$ and PDF $f$ on $[0, 1]$ . Each evaluation costs $c > 0$ . The agent uses a satisficing strategy: evaluate alternatives sequentially and accept the first with utility $u \geq \tau$ . The expected net payoff of this strategy is:

$V(\tau) = \frac{\mathbb{E}[u \mid u \geq \tau] \cdot (1 - F(\tau)) - c}{1 - F(\tau)} = \mathbb{E}[u \mid u \geq \tau] - \frac{c}{1 - F(\tau)}$

The optimal threshold $\tau^*$ satisfies:

$\tau^* = \mathbb{E}[u \mid u \geq \tau^*] - \frac{c}{1 - F(\tau^*)}$

This is the reservation value equation: the threshold equals the expected net payoff of continuing to search.

Intuition

The agent should set $\tau$ so that accepting any alternative at exactly the threshold gives the same expected payoff as rejecting it and continuing to search. If the threshold is too low, the agent accepts mediocre alternatives too quickly. If the threshold is too high, the agent wastes too much on search costs before finding anything acceptable. The optimal threshold balances these two errors.

Proof Sketch

The expected number of evaluations before finding $u \geq \tau$ is $1/(1-F(\tau))$ , since each draw exceeds $\tau$ with probability $1 - F(\tau)$ (geometric distribution). The expected total search cost is $c/(1-F(\tau))$ . The expected value of the accepted alternative is $\mathbb{E}[u \mid u \geq \tau]$ . The net payoff is the difference. Setting $dV/d\tau = 0$ and rearranging gives the reservation value equation. The fixed-point characterization follows from the envelope condition of the optimal stopping problem.

Why It Matters

This theorem formalizes Simon's intuition that satisficing is not irrational but is instead the rational response to search costs. The optimal threshold depends on the distribution of alternatives and the cost of search. When search is cheap ( $c \to 0$ ), the threshold rises toward the maximum of $F$ . When search is expensive, the threshold drops, and the agent accepts lower quality. This maps directly to ML: beam width in search, early stopping patience, and sampling temperature all control an implicit satisficing threshold.

Failure Mode

The theorem assumes the agent knows the distribution $F$ . In practice, agents must estimate $F$ from experience, which introduces a learning problem on top of the search problem. The theorem also assumes i.i.d. draws. If alternatives arrive in a correlated or adversarial order (as in adversarial search or game trees), the analysis changes substantially. The optimal stopping literature (secretary problem, prophet inequalities) addresses some of these extensions.

report a correction →

Computational Complexity as a Foundation

Simon's argument gains formal teeth from computational complexity theory.

Definition

Intractability as Bounded Rationality Justification

A decision problem is NP-hard if and only if every problem in NP can be reduced to it in polynomial time. Many optimization problems faced by real agents are NP-hard: traveling salesman, satisfiability, protein folding, optimal neural architecture search. Under standard complexity assumptions, general exact polynomial-time algorithms for these problems do not exist, so perfect optimization becomes infeasible for large instances.

When the optimization problem is NP-hard, satisficing is not a cognitive bias. It is the only available strategy. The question becomes: which heuristics achieve good approximation ratios, and under what conditions?

This connects bounded rationality to the theory of approximation algorithms. A heuristic that finds a solution within a factor $\alpha$ of optimal in polynomial time is a formal model of satisficing with a quantifiable performance guarantee.

Heuristics as Rational Strategies

Definition

Fast-and-Frugal Heuristic

A fast-and-frugal heuristic (Gigerenzer) is a decision rule that uses limited information and computation to achieve good performance in a specific environment. Examples:

Take-the-best: rank cues by validity, choose the first cue that discriminates between alternatives, and decide based on that cue alone.
1/N rule: allocate resources equally among $N$ options, ignoring all information about correlations and expected returns.
Recognition heuristic: if you recognize one option but not the other, choose the recognized one.

These heuristics ignore most available information. They succeed because ignoring irrelevant information reduces overfitting to noise, especially in small-sample environments.

The $1/N$ portfolio allocation rule illustrates a key insight. DeMiguel, Garlappi, and Uppal (2009) showed that $1/N$ outperforms mean-variance optimization on real financial data when the estimation window is short. Mean-variance optimization is the theoretically optimal strategy, but it requires estimating $O(N^2)$ covariance parameters from limited data. The estimation error exceeds the optimization gain. The "naive" heuristic wins because it has zero estimation error.

Aspiration Adaptation

Definition

Aspiration Adaptation

Aspiration adaptation is the process by which an agent adjusts its satisficing threshold $\tau$ over time based on the outcomes of past searches. If the agent consistently finds alternatives above $\tau$ , it raises the threshold. If it consistently fails, it lowers it. Formally:

$\tau_{t+1} = \tau_t + \eta \cdot (u_t - \tau_t)$

where $u_t$ is the utility of the best alternative found at time $t$ and $\eta \in (0, 1)$ is a learning rate. This is an exponential moving average of past experience.

This adaptation rule resembles learning rate schedules in optimization. A high initial $\tau$ (ambitious aspiration) corresponds to aggressive early stopping. A gradually decreasing $\tau$ corresponds to increasing patience. The analogy is precise: both aspiration adaptation and learning rate decay manage the tradeoff between exploration cost and solution quality.

Sciences of the Artificial

Simon's broader framework, developed in The Sciences of the Artificial (1969), considers any system that is designed to achieve goals within constraints imposed by an environment. The key distinction:

Natural sciences study what is: the laws governing physical systems.
Sciences of the artificial study what could be: the space of possible designs, given goals and constraints.

An ML model is an artifact in Simon's sense. It is designed (trained) to achieve a goal (minimize loss) within constraints (architecture, computation budget, data). Understanding the model requires understanding both the goal and the constraints, not just the final weights. Two models with identical weights but different training constraints may generalize differently because the constraints shape the optimization trajectory.

Connections to ML

Beam Search as Satisficing

Beam search in sequence generation keeps only the top $k$ candidates at each step, discarding the rest. This is satisficing: the agent searches a restricted set and accepts the best within that set, rather than searching the full exponential space of all possible sequences. The beam width $k$ controls the tradeoff between search thoroughness and computational cost.

Early Stopping as Bounded Optimization

Early stopping halts training when validation loss stops improving. The agent could continue training (potentially finding a lower training loss) but accepts the current solution as good enough. The patience parameter is the aspiration adaptation rate: how many epochs of non-improvement the agent tolerates before stopping.

Pruning as Resource-Bounded Rationality

Alpha-beta pruning in game trees eliminates branches that cannot affect the final decision. This reduces the effective branching factor from $b$ to approximately $b^{1/2}$ (with perfect move ordering). The pruned agent makes the same decision as the unpruned agent but uses far less computation. This is bounded rationality without any loss in decision quality.

MCTS as Heuristic Search

Monte Carlo Tree Search (MCTS) combines tree search with random rollouts. Instead of evaluating all nodes (infeasible in large game trees), it allocates evaluation budget to promising subtrees using the UCB1 / UCT criterion (Kocsis & Szepesvári 2006), which balances exploitation of high-value children against exploration of less-visited ones via an upper-confidence bonus. This is optimism in the face of uncertainty, not information-gain maximization in the Bayesian sense (information-directed sampling and entropy-search methods are distinct algorithms with different acquisition rules); MCTS is closer in spirit to bounded rational search because the agent allocates its computational budget to subtrees with the highest plausible value rather than searching exhaustively.

LLM Inference and Temperature

Sampling from an LLM with temperature $T$ controls the randomness of token selection by rescaling logits before the softmax: at $T = 0$ (greedy decoding) the model deterministically selects the highest-probability token, and as $T$ grows the distribution flattens toward uniform. This is not a satisficing threshold in Simon's sense — temperature does not specify an aspiration level above which a candidate is accepted, it only controls how concentrated the sampling distribution is. The closer analogues to satisficing thresholds at decode time are top- $k$ sampling, top- $p$ (nucleus) sampling (Holtzman et al. 2020), and best-of- $n$ rejection sampling against a reward threshold; those operations actually truncate or gate the candidate set against a criterion. Temperature is a search-breadth knob, not an acceptance rule.

Agent Tool Selection

When an LLM agent must choose which tool to call, it faces a search problem: evaluate available tools, estimate which one best serves the current subgoal, and select. With many tools, exhaustive evaluation is infeasible. Practical agents use heuristics: keyword matching, retrieval over tool descriptions, or learned routing functions. These are fast-and-frugal heuristics applied to tool selection.

Common Confusions

Watch Out

Bounded rationality does not mean irrationality

Bounded rationality does not mean irrationality. It means rational behavior under real computational and informational constraints. Simon's agents are not stupid; they are smart about being resource-limited. A satisficing agent that finds a good solution in polynomial time outperforms a "perfectly rational" agent that cannot finish computing the optimum before the deadline. The bounded agent makes a better decision in practice because it makes a decision.

Watch Out

Satisficing is not laziness

Satisficing is not the claim that agents should accept the first option they see. The threshold $\tau$ matters. A well-calibrated satisficer with a high threshold and low search costs will find near-optimal solutions. Satisficing becomes suboptimal only when the threshold is poorly calibrated or when the search cost model is wrong. The theory prescribes how to set $\tau$ optimally given the search cost structure.

Watch Out

Heuristics do not always underperform optimal strategies

The bias-variance tradeoff applies to decision strategies, not just estimators. An optimal strategy estimated from limited data has high variance. A simple heuristic has high bias but zero variance. In small-sample regimes, the heuristic often wins. Gigerenzer's program shows this empirically across dozens of domains. The question is not "is the heuristic optimal?" but "does the heuristic outperform the estimated optimum in practice?"

Exercises

ExerciseCore

Problem

Alternatives are drawn uniformly from $[0, 1]$ (so $F(u) = u$ and $f(u) = 1$ ). The search cost is $c = 0.01$ per evaluation. Using the satisficing threshold theorem, find the optimal threshold $\tau^*$ .

ExerciseCore

Problem

A chess engine has 0.5 seconds to choose a move. It can evaluate $10^6$ positions per second. The game tree has branching factor $b = 35$ and the engine uses minimax search. What is the maximum search depth the engine can explore exhaustively? If instead it uses alpha-beta pruning with perfect move ordering (effective branching factor $\sqrt{b}$ ), what depth can it reach? Relate this to bounded rationality.

ExerciseAdvanced

Problem

The $1/N$ portfolio rule allocates wealth equally across $N$ assets. Mean-variance optimization selects weights $w^* = \Sigma^{-1}\mu / \mathbf{1}^\top \Sigma^{-1}\mu$ where $\mu$ is the vector of expected returns and $\Sigma$ is the covariance matrix. With $T$ time periods of return data, the estimation error in $\hat{\Sigma}$ scales as $O(N^2/T)$ . How large must $T$ be relative to $N$ for mean-variance optimization to outperform $1/N$ ? State the qualitative answer and explain why this illustrates bounded rationality.

ExerciseResearch

Problem

Design a formal model of an LLM agent that must decide when to stop "thinking" (generating chain-of-thought tokens) and commit to an answer. The agent pays cost $c$ per token generated and receives reward $r(q)$ where $q$ is the quality of the final answer (a function of the reasoning trace). Formulate this as an optimal stopping problem. Under what conditions does the optimal policy have a threshold structure (stop when estimated answer quality exceeds some $\tau$ )?

References

Canonical:

Simon, Models of Bounded Rationality, Vols. 1-2 (1982), Chapters 1-2, 7-8
Simon, The Sciences of the Artificial (3rd ed., 1996), Chapters 1-4

Heuristics and ecological rationality:

Gigerenzer, Todd, and the ABC Research Group, Simple Heuristics That Make Us Smart (1999), Chapters 1-4
Gigerenzer and Selten (eds.), Bounded Rationality: The Adaptive Toolbox (2001), Chapters 1-3

Formal connections:

DeMiguel, Garlappi, and Uppal, "Optimal Versus Naive Diversification," Review of Financial Studies 22(5), 2009
Papadimitriou and Yannakakis, "On the Approximability of Trade-offs and Optimal Access of Web Sources," FOCS 2000

Search and stopping theory:

DeGroot, Optimal Statistical Decisions (1970), Chapters 12-13
Ferguson, Optimal Stopping and Applications (2006), Chapters 1-2

Next Topics

Natural extensions from bounded rationality:

Game theory: strategic interaction between agents, each of whom is boundedly rational
Expected utility: the idealized framework that bounded rationality critiques and refines

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Convex Optimization Basicslayer 1 · tier 1
Convex Tinkeringlayer 2 · tier 2
Decision Theory Foundationslayer 2 · tier 2
Leverage Points in Complex Systemslayer 3 · tier 2

Derived topics

2

Game Theory Foundationslayer 2 · tier 1
Expected Utility Theorylayer 2 · tier 2

Graph-backed continuations

Game Theory Foundations Expected Utility Theory