Kelly Criterion

Sneiderman, Robby

Decision Theory

Kelly Criterion

The mathematically optimal bet size. Maximize expected log wealth, the Kelly fraction, connections to information theory and Shannon, and why full Kelly is often too aggressive in practice.

CoreTier 2StableReference~35 min

Prerequisites

Common Probability Distributions Information Theory Foundations Convex Tinkering Expected Utility

Start 8-question practice · 2 available Prereq Map

Learning position

Read this page in the graph.

decision-theory | layer 2 | tier 2. This page has 4 direct prerequisites and 1 published dependent.

Open Atlas Prerequisites Leads to

What next

Fat Tails and Heavy-Tailed Distributions

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Every decision under uncertainty involves choosing how much to risk. Bet too little and you leave returns on the table. Bet too much and you risk ruin. The Kelly criterion gives the mathematically precise answer: the fraction of wealth to bet that maximizes the long-run growth rate.

The result connects three seemingly distinct ideas:

Gambling theory: what is the optimal bet size in a repeated game?
Information theory: the growth rate equals the mutual information between the bet and the outcome
Portfolio theory: Kelly betting is equivalent to maximizing expected log utility, which is the only criterion that dominates all other strategies in the long run

The Kelly criterion appears in quantitative finance, portfolio construction, reinforcement learning (reward shaping and bankroll management), and any setting where you make repeated decisions under uncertainty with compounding outcomes.

Mental Model

You have a biased coin that lands heads with probability $p = 0.6$ . You start with $100. On each flip, you can bet any fraction $f$ of your current wealth. If heads, you gain $f$ times your wealth. If tails, you lose $f$ times your wealth.

Bet everything ( $f = 1$ ): you double or go broke. After a few flips, you will almost certainly be ruined (probability of surviving $n$ rounds is $0.6^n$ , which goes to zero).

Bet nothing ( $f = 0$ ): your wealth stays at $100. Safe but suboptimal.

Bet the Kelly fraction ( $f^* = 2p - 1 = 0.2$ ): you bet 20% of your wealth each round. Your wealth grows exponentially at the maximum possible rate, and the probability of ruin is zero (since you never bet everything).

The Kelly criterion tells you: the optimal fraction is $f^* = 0.2$ . Any fraction larger than $2f^* = 0.4$ actually produces a negative expected growth rate, meaning your wealth shrinks to zero with probability 1.

Core Definitions

Definition

Expected Log Growth Rate

For a repeated betting game where you wager fraction $f$ of your wealth on each round, the expected log growth rate (or expected geometric growth rate) is:

$g(f) = \mathbb{E}[\log(1 + f \cdot R)]$

where $R$ is the random return on the bet ( $R > 0$ for a win, $-1 \leq R < 0$ for a loss). The wealth after $n$ rounds is:

$W_n = W_0 \prod_{i=1}^n (1 + f \cdot R_i)$

so $\log(W_n / W_0) = \sum_{i=1}^n \log(1 + f \cdot R_i)$ . By the law of large numbers:

$\frac{1}{n}\log(W_n / W_0) \xrightarrow{\text{a.s.}} g(f)$

Positive $g(f)$ means wealth grows exponentially at rate $g(f)$ ; negative $g(f)$ means wealth shrinks to zero.

Definition

Kelly Fraction

The Kelly fraction $f^*$ is the value of $f$ that maximizes the expected log growth rate:

$f^* = \arg\max_f \, \mathbb{E}[\log(1 + f \cdot R)]$

For a binary bet with win probability $p$ , payout $b$ to 1 (you gain $b$ for each unit wagered on a win, and lose 1 unit on a loss):

$f^* = \frac{bp - (1-p)}{b} = \frac{bp - q}{b}$

where $q = 1 - p$ . For even-money bets ( $b = 1$ ): $f^* = 2p - 1$ .

The Kelly fraction is positive only when the expected return is positive ( $bp > q$ , i.e., the bet has positive edge). When the edge is zero or negative, the Kelly fraction is zero or negative (do not bet).

Definition

Fractional Kelly

Fractional Kelly betting uses fraction $\lambda f^*$ instead of $f^*$ , where $\lambda \in (0, 1)$ is a scaling factor. Common choices are $\lambda = 0.5$ (half-Kelly) or $\lambda = 0.25$ (quarter-Kelly).

Fractional Kelly sacrifices some expected growth rate in exchange for lower variance and lower probability of large drawdowns. The growth rate under fractional Kelly is:

$g(\lambda f^*) = \lambda g(f^*) - \frac{\lambda^2}{2} \text{Var}[\log(1 + f^* R)] + O(\lambda^3)$

For small $\lambda$ , the growth rate is approximately proportional to $\lambda$ while the variance is proportional to $\lambda^2$ , giving a favorable risk-return tradeoff.

Main Theorems

Theorem

Kelly Criterion Optimality

Statement

Let $W_n(f)$ be the wealth after $n$ rounds when betting fraction $f$ on each round, and let $f^* = (bp - q)/b$ be the Kelly fraction. Then:

Growth rate maximization: $g(f^*) = \max_{f \in [0,1]} g(f)$ where $g(f) = p \log(1 + bf) + q \log(1 - f)$ .
Asymptotic dominance: for any other fixed fraction $f \neq f^*$ ,

$\frac{W_n(f)}{W_n(f^*)} \xrightarrow{\text{a.s.}} 0 \quad \text{as } n \to \infty$

The Kelly bettor's wealth eventually exceeds any other fixed-fraction bettor's wealth by an arbitrarily large factor.

Connection to information theory: the optimal growth rate equals the Kullback-Leibler divergence from the "worst case" distribution to the true distribution:

$g(f^*) = D_{\text{KL}}(p \| r)$

where $r = 1/(1+b)$ is the break-even probability.

Intuition

Betting fraction $f$ creates a tradeoff: larger $f$ gives higher expected single-round return but higher variance. The log function penalizes large losses more than it rewards large gains (it is concave). The Kelly fraction is the exact point where the marginal benefit of betting more (higher expected return) equals the marginal cost (higher variance dragging down the geometric mean).

Because wealth compounds multiplicatively, the geometric mean (not the arithmetic mean) determines long-run growth. Maximizing the geometric mean is equivalent to maximizing $\mathbb{E}[\log W]$ , which is the Kelly criterion.

Proof Sketch

Growth rate maximization: The expected log growth rate for a binary bet is:

$g(f) = p \log(1 + bf) + q \log(1 - f)$

Differentiate: $g'(f) = \frac{pb}{1 + bf} - \frac{q}{1 - f}$

Set $g'(f) = 0$ : $pb(1 - f) = q(1 + bf)$ , giving $pb - pbf = q + qbf$ , so $f(pb + qb) = pb - q$ , yielding $f^* = (pb - q)/b$ .

Check $g''(f^*) < 0$ : $g''(f) = -\frac{pb^2}{(1+bf)^2} - \frac{q}{(1-f)^2} < 0$ for all $f$ , confirming the maximum.

Asymptotic dominance: By the SLLN: $\frac{1}{n} \log W_n(f) \to g(f)$ a.s. Since $g(f) < g(f^*)$ for $f \neq f^*$ (strict concavity), we have $\frac{1}{n} \log \frac{W_n(f)}{W_n(f^*)} \to g(f) - g(f^*) < 0$ , so the ratio goes to zero exponentially fast.

Information theory connection: Evaluate $g(f^*)$ at $f^* = (bp - q)/b$ . After algebraic manipulation: $g(f^*) = p \log(p/r) + q \log(q/(1-r))$ where $r = 1/(1+b)$ . This is $D_{\text{KL}}((p, q) \| (r, 1-r))$ .

Why It Matters

The Kelly criterion is one of the few results in decision theory that gives a unique, principled answer to "how much should I bet?" without requiring an arbitrary utility function. The log-utility justification comes from the long-run dominance property: you do not need to assume log utility as an axiom. Any strategy that maximizes long-run wealth growth must use the Kelly fraction.

The connection to information theory is deep: the growth rate equals the information advantage you have over the market. If you have no information advantage (the bet is fair), the Kelly fraction is zero. This links gambling theory to channel capacity in communications.

Failure Mode

The Kelly criterion assumes:

Known probabilities: in practice, $p$ is estimated, not known. If your estimate of $p$ is wrong, the Kelly fraction can be catastrophically wrong. Overestimating your edge leads to overbetting.
i.i.d. bets: if outcomes are correlated or non-stationary, the optimal fraction changes over time.
No transaction costs or constraints: real betting has fees, minimum bet sizes, and maximum position limits.
Infinite time horizon: Kelly is optimal in the long run. For finite horizons, other criteria may be preferable.
No fat tails: if the return distribution has infinite variance or heavier tails than assumed, full Kelly can lead to severe drawdowns.

report a correction →

Proposition

Over-Kelly Ruin

Statement

For even-money bets ( $b = 1$ ) with win probability $p > 1/2$ , the Kelly fraction is $f^* = 2p - 1$ . If the bettor uses fraction $f > 2f^*$ , then the expected log growth rate is negative:

$g(f) < 0 \quad \text{for } f > 2f^*$

This means the wealth converges to zero almost surely: $W_n \xrightarrow{\text{a.s.}} 0$ .

Betting more than twice the Kelly fraction is worse than not betting at all. The growth rate $g(f)$ is a concave parabola-like function that is positive on $(0, 2f^*)$ , maximized at $f^*$ , and negative for $f > 2f^*$ .

Intuition

Overbetting is worse than underbetting. If you bet half of Kelly, your growth rate is 75% of optimal. If you bet twice Kelly, your growth rate is zero (you break even in the long run). If you bet more than twice Kelly, you lose money despite having a positive edge. The asymmetry between overbetting and underbetting is extreme. This is why practitioners almost universally use fractional Kelly.

Proof Sketch

For even-money bets: $g(f) = p \log(1 + f) + (1 - p) \log(1 - f)$ .

At $f = 2f^* = 2(2p - 1)$ : compute $g(2f^*)$ directly and verify it equals zero by noting that the growth rate is a concave function with roots at $f = 0$ and $f = 2f^*$ (the second root follows from the symmetry of the log growth function around the Kelly point).

More precisely: $g(0) = 0$ and $g'(0) = bp - q > 0$ (positive edge), so $g$ starts positive. Since $g$ is strictly concave, it has exactly one other root at some $f_0 > f^*$ . Computing this root gives $f_0 = 2f^*$ for even-money bets. For general payouts, the upper boundary is $f_0 = 1$ (you go bankrupt if you bet everything and lose).

Why It Matters

This result shows that the penalty for overbetting is severe and asymmetric. In practice, estimation error in $p$ means you are likely to deviate from true Kelly. Since overbetting is much more damaging than underbetting, this provides the mathematical justification for fractional Kelly: use $\lambda f^*$ with $\lambda \in [0.25, 0.5]$ to build in a safety margin against parameter estimation error.

Failure Mode

The exact location of the zero-growth boundary depends on the payout structure. For non-binary bets or continuous return distributions, the boundary is not simply $2f^*$ , and must be computed from the specific distribution of returns.

report a correction →

Connection to Shannon and Information Theory

Kelly's 1956 paper was titled "A New Interpretation of Information Rate." The connection is direct: the maximum growth rate of wealth equals the capacity of the "side information channel."

Consider a horse race with $n$ horses. You observe a noisy signal $Y$ about which horse will win. The mutual information $I(X; Y)$ between the winning horse $X$ and your signal $Y$ quantifies how much your signal tells you about the outcome. Kelly showed:

$g^* = I(X; Y)$

The optimal growth rate equals the mutual information. If you have no information ( $Y$ is independent of $X$ ), $g^* = 0$ and the optimal strategy is to not bet. If you have perfect information ( $H(X \mid Y) = 0$ ), $g^* = H(X)$ and you can grow wealth at the entropy rate of the race.

This connection is not a coincidence. Both information theory and Kelly betting are about exploiting probabilistic asymmetries. Shannon's channel capacity tells you the maximum rate at which information can be transmitted reliably. Kelly's growth rate tells you the maximum rate at which wealth can grow reliably. The mathematics is identical.

Practical Considerations

Why Full Kelly is Too Aggressive

Full Kelly maximizes long-run growth but produces enormous volatility along the way. For even-money bets with $p = 0.55$ (a 10% edge), the Kelly fraction is $f^* = 0.10$ . At full Kelly:

The standard deviation of annual returns is very large
Drawdowns of 50% or more are common
The probability of being down 50% at some point before reaching a target wealth is substantial

Half-Kelly ( $f = 0.05$ ) gives 75% of the growth rate but roughly half the volatility. Quarter-Kelly gives about 44% of the growth rate with a quarter of the volatility. Most practitioners and quantitative funds use $\lambda \in [0.1, 0.5]$ .

Parameter Uncertainty

The Kelly fraction depends on knowing $p$ exactly. If $p$ is estimated from data with standard error $\sigma_p$ , the estimation error in $f^*$ is of order $\sigma_p / b$ . Since overbetting is more dangerous than underbetting, the correct adjustment is to shrink toward zero:

$f_{\text{adjusted}} = \max\left(0, \, f^* - c \cdot \sigma_{f^*}\right)$

for some constant $c$ (typically $c = 1$ or $c = 2$ ). This is equivalent to using a conservative estimate of the edge.

Connection to Markowitz Portfolio Theory

For continuous returns with mean $\mu$ and variance $\sigma^2$ , a second-order Taylor expansion of $\mathbb{E}[\log(1 + fR)]$ gives:

$g(f) \approx f\mu - \frac{f^2 \sigma^2}{2}$

Maximizing: $f^* = \mu / \sigma^2$ . This is the Kelly fraction for continuous returns, and it equals the Sharpe ratio divided by the volatility: $f^* = (\mu / \sigma) / \sigma$ . The Kelly portfolio in the continuous case is equivalent to the Markowitz mean-variance optimal portfolio with log utility.

Canonical Examples

Example

Even-money coin flip

Coin lands heads with $p = 0.6$ , even-money payout ( $b = 1$ ).

Kelly fraction: $f^* = 2p - 1 = 0.2$ .

Growth rate: $g(0.2) = 0.6 \log(1.2) + 0.4 \log(0.8) = 0.6(0.1823) + 0.4(-0.2231) = 0.1094 - 0.0892 = 0.0202$ .

After $n = 100$ bets: expected $\log(W_{100}/W_0) = 100 \times 0.0202 = 2.02$ , so expected wealth ratio is $e^{2.02} \approx 7.5$ .

At half-Kelly ( $f = 0.1$ ): $g(0.1) = 0.6 \log(1.1) + 0.4 \log(0.9) = 0.0572 - 0.0421 = 0.0151$ . Growth rate is 75% of full Kelly.

At double-Kelly ( $f = 0.4$ ): $g(0.4) = 0.6 \log(1.4) + 0.4 \log(0.6) = 0.2019 - 0.2041 = -0.0022$ . Negative growth: wealth goes to zero.

Example

Kelly with 3-to-1 payout

Win probability $p = 0.3$ , payout $b = 3$ (you win $3 for each $1 wagered when you win).

Expected value per bet: $3(0.3) - 1(0.7) = 0.9 - 0.7 = 0.2 > 0$ (positive edge).

Kelly fraction: $f^* = (bp - q)/b = (0.9 - 0.7)/3 = 0.2/3 \approx 0.0667$ .

Despite the positive expected value, the optimal bet is only 6.7% of wealth, because the high loss probability (70%) makes larger bets too risky for long-run compounding.

Common Confusions

Watch Out

Kelly does not maximize expected wealth

Kelly maximizes expected log wealth, not expected wealth. The strategy that maximizes $\mathbb{E}[W_n]$ is to bet everything ( $f = 1$ ) on each round (if the bet has positive expected value). But this strategy leads to ruin with probability approaching 1. The distinction between maximizing the arithmetic mean and the geometric mean is the entire point of the Kelly criterion.

Watch Out

Positive edge does not mean you should bet large

A bet with $p = 0.51$ and even-money payout has a positive edge of 2%. The Kelly fraction is $f^* = 0.02$ . You should bet only 2% of your wealth. Beginners often confuse "positive expected value" with "bet big." The size of the optimal bet depends on the ratio of the edge to the variance, not just on the sign of the edge.

Watch Out

Kelly assumes you know the true probabilities

In practice, you estimate $p$ from data. If your estimate is wrong, full Kelly can be catastrophic. If you overestimate your edge by 2x, you are betting at double-Kelly, which produces zero or negative growth. This sensitivity to parameter estimates is the strongest practical argument for fractional Kelly.

Watch Out

Kelly is not a short-run strategy

Kelly is optimal in the limit $n \to \infty$ . For small $n$ , Kelly can produce large drawdowns and may be dominated by more conservative strategies depending on your risk preferences. The dominance of Kelly over all other strategies is an asymptotic result.

Exercises

ExerciseCore

Problem

A bet pays 2-to-1 (you gain $2 for each $1 wagered) with win probability $p = 0.4$ . Compute the Kelly fraction and the expected log growth rate per bet.

ExerciseCore

Problem

Show that for even-money bets ( $b = 1$ ), the Kelly fraction satisfies $f^* = 2p - 1$ and that the expected log growth rate at full Kelly is $g(f^*) = 1 + p \log_2 p + (1-p) \log_2(1-p) = 1 - H(p)$ bits per bet, where $H(p)$ is the binary entropy.

ExerciseAdvanced

Problem

Derive the Kelly fraction for a continuous return distribution. Suppose the return per dollar invested is $R$ with $\mathbb{E}[R] = \mu > 0$ and $\text{Var}(R) = \sigma^2$ . Using a second-order Taylor expansion of $\mathbb{E}[\log(1 + fR)]$ around $f = 0$ , show that $f^* \approx \mu / \sigma^2$ .

ExerciseAdvanced

Problem

You estimate the win probability of a bet to be $\hat{p} = 0.55$ with standard error $\sigma_{\hat{p}} = 0.05$ . The true probability is $p$ (unknown). The payout is even money ( $b = 1$ ). Compare the expected growth rate under full Kelly using $\hat{p}$ vs. half-Kelly using $\hat{p}$ , accounting for the possibility that $p$ might be as low as $\hat{p} - 2\sigma = 0.45$ .

References

Canonical:

Kelly, "A New Interpretation of Information Rate," Bell System Technical Journal (1956)
Cover & Thomas, Elements of Information Theory (2nd ed., 2006), Chapter 6
Thorp, "The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market," in The Kelly Capital Growth Investment Criterion (2011), Chapters 1-3

Current:

MacLean, Thorp, & Ziemba, The Kelly Capital Growth Investment Criterion (2011), Chapters 1, 6, 9
Taleb, Statistical Consequences of Fat Tails (2020), Chapter 13
Peters & Gell-Mann, "Evaluating Gambles Using Dynamics," Chaos (2016), Section 3

Next Topics

Building on the Kelly criterion:

Fat tails: why full Kelly is especially dangerous when return distributions have heavy tails
The Kelly criterion connects to expected utility theory through the log utility function

Last reviewed: April 15, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Common Probability Distributionslayer 0A · tier 1
Information Theory Foundationslayer 0B · tier 2
Convex Tinkeringlayer 2 · tier 2
Expected Utility Theorylayer 2 · tier 2

Derived topics

1

Fat Tails and Heavy-Tailed Distributionslayer 2 · tier 1

Graph-backed continuations

Fat Tails and Heavy-Tailed Distributions