Decision Theory
Kelly Criterion
The mathematically optimal bet size. Maximize expected log wealth, the Kelly fraction, connections to information theory and Shannon, and why full Kelly is often too aggressive in practice.
Why This Matters
Every decision under uncertainty involves choosing how much to risk. Bet too little and you leave returns on the table. Bet too much and you risk ruin. The Kelly criterion gives the mathematically precise answer: the fraction of wealth to bet that maximizes the long-run growth rate.
The result connects three seemingly distinct ideas:
- Gambling theory: what is the optimal bet size in a repeated game?
- Information theory: the growth rate equals the mutual information between the bet and the outcome
- Portfolio theory: Kelly betting is equivalent to maximizing expected log utility, which is the only criterion that dominates all other strategies in the long run
The Kelly criterion appears in quantitative finance, portfolio construction, reinforcement learning (reward shaping and bankroll management), and any setting where you make repeated decisions under uncertainty with compounding outcomes.
Mental Model
You have a biased coin that lands heads with probability . You start with $100. On each flip, you can bet any fraction of your current wealth. If heads, you gain times your wealth. If tails, you lose times your wealth.
Bet everything (): you double or go broke. After a few flips, you will almost certainly be ruined (probability of surviving rounds is , which goes to zero).
Bet nothing (): your wealth stays at $100. Safe but suboptimal.
Bet the Kelly fraction (): you bet 20% of your wealth each round. Your wealth grows exponentially at the maximum possible rate, and the probability of ruin is zero (since you never bet everything).
The Kelly criterion tells you: the optimal fraction is . Any fraction larger than actually produces a negative expected growth rate, meaning your wealth shrinks to zero with probability 1.
Core Definitions
Expected Log Growth Rate
For a repeated betting game where you wager fraction of your wealth on each round, the expected log growth rate (or expected geometric growth rate) is:
where is the random return on the bet ( for a win, for a loss). The wealth after rounds is:
so . By the law of large numbers:
The wealth grows exponentially at rate if , and shrinks to zero if .
Kelly Fraction
The Kelly fraction is the value of that maximizes the expected log growth rate:
For a binary bet with win probability , payout to 1 (you gain for each unit wagered on a win, and lose 1 unit on a loss):
where . For even-money bets (): .
The Kelly fraction is positive only when the expected return is positive (, i.e., the bet has positive edge). When the edge is zero or negative, the Kelly fraction is zero or negative (do not bet).
Fractional Kelly
Fractional Kelly betting uses fraction instead of , where is a scaling factor. Common choices are (half-Kelly) or (quarter-Kelly).
Fractional Kelly sacrifices some expected growth rate in exchange for lower variance and lower probability of large drawdowns. The growth rate under fractional Kelly is:
For small , the growth rate is approximately proportional to while the variance is proportional to , giving a favorable risk-return tradeoff.
Main Theorems
Kelly Criterion Optimality
Statement
Let be the wealth after rounds when betting fraction on each round, and let be the Kelly fraction. Then:
-
Growth rate maximization: where .
-
Asymptotic dominance: for any other fixed fraction ,
The Kelly bettor's wealth eventually exceeds any other fixed-fraction bettor's wealth by an arbitrarily large factor.
- Connection to information theory: the optimal growth rate equals the Kullback-Leibler divergence from the "worst case" distribution to the true distribution:
where is the break-even probability.
Intuition
Betting fraction creates a tradeoff: larger gives higher expected single-round return but higher variance. The log function penalizes large losses more than it rewards large gains (it is concave). The Kelly fraction is the exact point where the marginal benefit of betting more (higher expected return) equals the marginal cost (higher variance dragging down the geometric mean).
Because wealth compounds multiplicatively, the geometric mean (not the arithmetic mean) determines long-run growth. Maximizing the geometric mean is equivalent to maximizing , which is the Kelly criterion.
Proof Sketch
Growth rate maximization: The expected log growth rate for a binary bet is:
Differentiate:
Set : , giving , so , yielding .
Check : for all , confirming the maximum.
Asymptotic dominance: By the SLLN: a.s. Since for (strict concavity), we have , so the ratio goes to zero exponentially fast.
Information theory connection: Evaluate at . After algebraic manipulation: where . This is .
Why It Matters
The Kelly criterion is one of the few results in decision theory that gives a unique, principled answer to "how much should I bet?" without requiring an arbitrary utility function. The log-utility justification comes from the long-run dominance property: you do not need to assume log utility as an axiom. Any strategy that maximizes long-run wealth growth must use the Kelly fraction.
The connection to information theory is deep: the growth rate equals the information advantage you have over the market. If you have no information advantage (the bet is fair), the Kelly fraction is zero. This links gambling theory to channel capacity in communications.
Failure Mode
The Kelly criterion assumes:
- Known probabilities: in practice, is estimated, not known. If your estimate of is wrong, the Kelly fraction can be catastrophically wrong. Overestimating your edge leads to overbetting.
- i.i.d. bets: if outcomes are correlated or non-stationary, the optimal fraction changes over time.
- No transaction costs or constraints: real betting has fees, minimum bet sizes, and maximum position limits.
- Infinite time horizon: Kelly is optimal in the long run. For finite horizons, other criteria may be preferable.
- No fat tails: if the return distribution has infinite variance or heavier tails than assumed, full Kelly can lead to severe drawdowns.
Over-Kelly Ruin
Statement
For even-money bets () with win probability , the Kelly fraction is . If the bettor uses fraction , then the expected log growth rate is negative:
This means the wealth converges to zero almost surely: .
Betting more than twice the Kelly fraction is worse than not betting at all. The growth rate is a concave parabola-like function that is positive on , maximized at , and negative for .
Intuition
Overbetting is worse than underbetting. If you bet half of Kelly, your growth rate is 75% of optimal. If you bet twice Kelly, your growth rate is zero (you break even in the long run). If you bet more than twice Kelly, you lose money despite having a positive edge. The asymmetry between overbetting and underbetting is extreme. This is why practitioners almost universally use fractional Kelly.
Proof Sketch
For even-money bets: .
At : compute directly and verify it equals zero by noting that the growth rate is a concave function with roots at and (the second root follows from the symmetry of the log growth function around the Kelly point).
More precisely: and (positive edge), so starts positive. Since is strictly concave, it has exactly one other root at some . Computing this root gives for even-money bets. For general payouts, the upper boundary is (you go bankrupt if you bet everything and lose).
Why It Matters
This result shows that the penalty for overbetting is severe and asymmetric. In practice, estimation error in means you are likely to deviate from true Kelly. Since overbetting is much more damaging than underbetting, this provides the mathematical justification for fractional Kelly: use with to build in a safety margin against parameter estimation error.
Failure Mode
The exact location of the zero-growth boundary depends on the payout structure. For non-binary bets or continuous return distributions, the boundary is not simply , and must be computed from the specific distribution of returns.
Connection to Shannon and Information Theory
Kelly's 1956 paper was titled "A New Interpretation of Information Rate." The connection is direct: the maximum growth rate of wealth equals the capacity of the "side information channel."
Consider a horse race with horses. You observe a noisy signal about which horse will win. The mutual information between the winning horse and your signal quantifies how much your signal tells you about the outcome. Kelly showed:
The optimal growth rate equals the mutual information. If you have no information ( is independent of ), and the optimal strategy is to not bet. If you have perfect information (), and you can grow wealth at the entropy rate of the race.
This connection is not a coincidence. Both information theory and Kelly betting are about exploiting probabilistic asymmetries. Shannon's channel capacity tells you the maximum rate at which information can be transmitted reliably. Kelly's growth rate tells you the maximum rate at which wealth can grow reliably. The mathematics is identical.
Practical Considerations
Why Full Kelly is Too Aggressive
Full Kelly maximizes long-run growth but produces enormous volatility along the way. For even-money bets with (a 10% edge), the Kelly fraction is . At full Kelly:
- The standard deviation of annual returns is very large
- Drawdowns of 50% or more are common
- The probability of being down 50% at some point before reaching a target wealth is substantial
Half-Kelly () gives 75% of the growth rate but roughly half the volatility. Quarter-Kelly gives about 44% of the growth rate with a quarter of the volatility. Most practitioners and quantitative funds use .
Parameter Uncertainty
The Kelly fraction depends on knowing exactly. If is estimated from data with standard error , the estimation error in is of order . Since overbetting is more dangerous than underbetting, the correct adjustment is to shrink toward zero:
for some constant (typically or ). This is equivalent to using a conservative estimate of the edge.
Connection to Markowitz Portfolio Theory
For continuous returns with mean and variance , a second-order Taylor expansion of gives:
Maximizing: . This is the Kelly fraction for continuous returns, and it equals the Sharpe ratio divided by the volatility: . The Kelly portfolio in the continuous case is equivalent to the Markowitz mean-variance optimal portfolio with log utility.
Canonical Examples
Even-money coin flip
Coin lands heads with , even-money payout ().
Kelly fraction: .
Growth rate: .
After bets: expected , so expected wealth ratio is .
At half-Kelly (): . Growth rate is 75% of full Kelly.
At double-Kelly (): . Negative growth: wealth goes to zero.
Kelly with 3-to-1 payout
Win probability , payout (you win $3 for each $1 wagered when you win).
Expected value per bet: (positive edge).
Kelly fraction: .
Despite the positive expected value, the optimal bet is only 6.7% of wealth, because the high loss probability (70%) makes larger bets too risky for long-run compounding.
Common Confusions
Kelly does not maximize expected wealth
Kelly maximizes expected log wealth, not expected wealth. The strategy that maximizes is to bet everything () on each round (if the bet has positive expected value). But this strategy leads to ruin with probability approaching 1. The distinction between maximizing the arithmetic mean and the geometric mean is the entire point of the Kelly criterion.
Positive edge does not mean you should bet large
A bet with and even-money payout has a positive edge of 2%. The Kelly fraction is . You should bet only 2% of your wealth. Beginners often confuse "positive expected value" with "bet big." The size of the optimal bet depends on the ratio of the edge to the variance, not just on the sign of the edge.
Kelly assumes you know the true probabilities
In practice, you estimate from data. If your estimate is wrong, full Kelly can be catastrophic. If you overestimate your edge by 2x, you are betting at double-Kelly, which produces zero or negative growth. This sensitivity to parameter estimates is the strongest practical argument for fractional Kelly.
Kelly is not a short-run strategy
Kelly is optimal in the limit . For small , Kelly can produce large drawdowns and may be dominated by more conservative strategies depending on your risk preferences. The dominance of Kelly over all other strategies is an asymptotic result.
Exercises
Problem
A bet pays 2-to-1 (you gain $2 for each $1 wagered) with win probability . Compute the Kelly fraction and the expected log growth rate per bet.
Problem
Show that for even-money bets (), the Kelly fraction satisfies and that the expected log growth rate at full Kelly is bits per bet, where is the binary entropy.
Problem
Derive the Kelly fraction for a continuous return distribution. Suppose the return per dollar invested is with and . Using a second-order Taylor expansion of around , show that .
Problem
You estimate the win probability of a bet to be with standard error . The true probability is (unknown). The payout is even money (). Compare the expected growth rate under full Kelly using vs. half-Kelly using , accounting for the possibility that might be as low as .
References
Canonical:
- Kelly, "A New Interpretation of Information Rate," Bell System Technical Journal (1956)
- Cover & Thomas, Elements of Information Theory (2nd ed., 2006), Chapter 6
- Thorp, "The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market," in The Kelly Capital Growth Investment Criterion (2011), Chapters 1-3
Current:
- MacLean, Thorp, & Ziemba, The Kelly Capital Growth Investment Criterion (2011), Chapters 1, 6, 9
- Taleb, Statistical Consequences of Fat Tails (2020), Chapter 13
- Peters & Gell-Mann, "Evaluating Gambles Using Dynamics," Chaos (2016), Section 3
Next Topics
Building on the Kelly criterion:
- Fat tails: why full Kelly is especially dangerous when return distributions have heavy tails
- The Kelly criterion connects to expected utility theory through the log utility function
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Information Theory FoundationsLayer 0B