Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Decision Theory

Kelly Criterion

The mathematically optimal bet size. Maximize expected log wealth, the Kelly fraction, connections to information theory and Shannon, and why full Kelly is often too aggressive in practice.

CoreTier 2Stable~35 min
0

Why This Matters

Every decision under uncertainty involves choosing how much to risk. Bet too little and you leave returns on the table. Bet too much and you risk ruin. The Kelly criterion gives the mathematically precise answer: the fraction of wealth to bet that maximizes the long-run growth rate.

The result connects three seemingly distinct ideas:

  • Gambling theory: what is the optimal bet size in a repeated game?
  • Information theory: the growth rate equals the mutual information between the bet and the outcome
  • Portfolio theory: Kelly betting is equivalent to maximizing expected log utility, which is the only criterion that dominates all other strategies in the long run

The Kelly criterion appears in quantitative finance, portfolio construction, reinforcement learning (reward shaping and bankroll management), and any setting where you make repeated decisions under uncertainty with compounding outcomes.

Mental Model

You have a biased coin that lands heads with probability p=0.6p = 0.6. You start with $100. On each flip, you can bet any fraction ff of your current wealth. If heads, you gain ff times your wealth. If tails, you lose ff times your wealth.

Bet everything (f=1f = 1): you double or go broke. After a few flips, you will almost certainly be ruined (probability of surviving nn rounds is 0.6n0.6^n, which goes to zero).

Bet nothing (f=0f = 0): your wealth stays at $100. Safe but suboptimal.

Bet the Kelly fraction (f=2p1=0.2f^* = 2p - 1 = 0.2): you bet 20% of your wealth each round. Your wealth grows exponentially at the maximum possible rate, and the probability of ruin is zero (since you never bet everything).

The Kelly criterion tells you: the optimal fraction is f=0.2f^* = 0.2. Any fraction larger than 2f=0.42f^* = 0.4 actually produces a negative expected growth rate, meaning your wealth shrinks to zero with probability 1.

Core Definitions

Definition

Expected Log Growth Rate

For a repeated betting game where you wager fraction ff of your wealth on each round, the expected log growth rate (or expected geometric growth rate) is:

g(f)=E[log(1+fR)]g(f) = \mathbb{E}[\log(1 + f \cdot R)]

where RR is the random return on the bet (R>0R > 0 for a win, 1R<0-1 \leq R < 0 for a loss). The wealth after nn rounds is:

Wn=W0i=1n(1+fRi)W_n = W_0 \prod_{i=1}^n (1 + f \cdot R_i)

so log(Wn/W0)=i=1nlog(1+fRi)\log(W_n / W_0) = \sum_{i=1}^n \log(1 + f \cdot R_i). By the law of large numbers:

1nlog(Wn/W0)a.s.g(f)\frac{1}{n}\log(W_n / W_0) \xrightarrow{\text{a.s.}} g(f)

The wealth grows exponentially at rate g(f)g(f) if g(f)>0g(f) > 0, and shrinks to zero if g(f)<0g(f) < 0.

Definition

Kelly Fraction

The Kelly fraction ff^* is the value of ff that maximizes the expected log growth rate:

f=argmaxfE[log(1+fR)]f^* = \arg\max_f \, \mathbb{E}[\log(1 + f \cdot R)]

For a binary bet with win probability pp, payout bb to 1 (you gain bb for each unit wagered on a win, and lose 1 unit on a loss):

f=bp(1p)b=bpqbf^* = \frac{bp - (1-p)}{b} = \frac{bp - q}{b}

where q=1pq = 1 - p. For even-money bets (b=1b = 1): f=2p1f^* = 2p - 1.

The Kelly fraction is positive only when the expected return is positive (bp>qbp > q, i.e., the bet has positive edge). When the edge is zero or negative, the Kelly fraction is zero or negative (do not bet).

Definition

Fractional Kelly

Fractional Kelly betting uses fraction λf\lambda f^* instead of ff^*, where λ(0,1)\lambda \in (0, 1) is a scaling factor. Common choices are λ=0.5\lambda = 0.5 (half-Kelly) or λ=0.25\lambda = 0.25 (quarter-Kelly).

Fractional Kelly sacrifices some expected growth rate in exchange for lower variance and lower probability of large drawdowns. The growth rate under fractional Kelly is:

g(λf)=λg(f)λ22Var[log(1+fR)]+O(λ3)g(\lambda f^*) = \lambda g(f^*) - \frac{\lambda^2}{2} \text{Var}[\log(1 + f^* R)] + O(\lambda^3)

For small λ\lambda, the growth rate is approximately proportional to λ\lambda while the variance is proportional to λ2\lambda^2, giving a favorable risk-return tradeoff.

Main Theorems

Theorem

Kelly Criterion Optimality

Statement

Let Wn(f)W_n(f) be the wealth after nn rounds when betting fraction ff on each round, and let f=(bpq)/bf^* = (bp - q)/b be the Kelly fraction. Then:

  1. Growth rate maximization: g(f)=maxf[0,1]g(f)g(f^*) = \max_{f \in [0,1]} g(f) where g(f)=plog(1+bf)+qlog(1f)g(f) = p \log(1 + bf) + q \log(1 - f).

  2. Asymptotic dominance: for any other fixed fraction fff \neq f^*,

Wn(f)Wn(f)a.s.0as n\frac{W_n(f)}{W_n(f^*)} \xrightarrow{\text{a.s.}} 0 \quad \text{as } n \to \infty

The Kelly bettor's wealth eventually exceeds any other fixed-fraction bettor's wealth by an arbitrarily large factor.

  1. Connection to information theory: the optimal growth rate equals the Kullback-Leibler divergence from the "worst case" distribution to the true distribution:

g(f)=DKL(pr)g(f^*) = D_{\text{KL}}(p \| r)

where r=1/(1+b)r = 1/(1+b) is the break-even probability.

Intuition

Betting fraction ff creates a tradeoff: larger ff gives higher expected single-round return but higher variance. The log function penalizes large losses more than it rewards large gains (it is concave). The Kelly fraction is the exact point where the marginal benefit of betting more (higher expected return) equals the marginal cost (higher variance dragging down the geometric mean).

Because wealth compounds multiplicatively, the geometric mean (not the arithmetic mean) determines long-run growth. Maximizing the geometric mean is equivalent to maximizing E[logW]\mathbb{E}[\log W], which is the Kelly criterion.

Proof Sketch

Growth rate maximization: The expected log growth rate for a binary bet is:

g(f)=plog(1+bf)+qlog(1f)g(f) = p \log(1 + bf) + q \log(1 - f)

Differentiate: g(f)=pb1+bfq1fg'(f) = \frac{pb}{1 + bf} - \frac{q}{1 - f}

Set g(f)=0g'(f) = 0: pb(1f)=q(1+bf)pb(1 - f) = q(1 + bf), giving pbpbf=q+qbfpb - pbf = q + qbf, so f(pb+qb)=pbqf(pb + qb) = pb - q, yielding f=(pbq)/bf^* = (pb - q)/b.

Check g(f)<0g''(f^*) < 0: g(f)=pb2(1+bf)2q(1f)2<0g''(f) = -\frac{pb^2}{(1+bf)^2} - \frac{q}{(1-f)^2} < 0 for all ff, confirming the maximum.

Asymptotic dominance: By the SLLN: 1nlogWn(f)g(f)\frac{1}{n} \log W_n(f) \to g(f) a.s. Since g(f)<g(f)g(f) < g(f^*) for fff \neq f^* (strict concavity), we have 1nlogWn(f)Wn(f)g(f)g(f)<0\frac{1}{n} \log \frac{W_n(f)}{W_n(f^*)} \to g(f) - g(f^*) < 0, so the ratio goes to zero exponentially fast.

Information theory connection: Evaluate g(f)g(f^*) at f=(bpq)/bf^* = (bp - q)/b. After algebraic manipulation: g(f)=plog(p/r)+qlog(q/(1r))g(f^*) = p \log(p/r) + q \log(q/(1-r)) where r=1/(1+b)r = 1/(1+b). This is DKL((p,q)(r,1r))D_{\text{KL}}((p, q) \| (r, 1-r)).

Why It Matters

The Kelly criterion is one of the few results in decision theory that gives a unique, principled answer to "how much should I bet?" without requiring an arbitrary utility function. The log-utility justification comes from the long-run dominance property: you do not need to assume log utility as an axiom. Any strategy that maximizes long-run wealth growth must use the Kelly fraction.

The connection to information theory is deep: the growth rate equals the information advantage you have over the market. If you have no information advantage (the bet is fair), the Kelly fraction is zero. This links gambling theory to channel capacity in communications.

Failure Mode

The Kelly criterion assumes:

  • Known probabilities: in practice, pp is estimated, not known. If your estimate of pp is wrong, the Kelly fraction can be catastrophically wrong. Overestimating your edge leads to overbetting.
  • i.i.d. bets: if outcomes are correlated or non-stationary, the optimal fraction changes over time.
  • No transaction costs or constraints: real betting has fees, minimum bet sizes, and maximum position limits.
  • Infinite time horizon: Kelly is optimal in the long run. For finite horizons, other criteria may be preferable.
  • No fat tails: if the return distribution has infinite variance or heavier tails than assumed, full Kelly can lead to severe drawdowns.
Proposition

Over-Kelly Ruin

Statement

For even-money bets (b=1b = 1) with win probability p>1/2p > 1/2, the Kelly fraction is f=2p1f^* = 2p - 1. If the bettor uses fraction f>2ff > 2f^*, then the expected log growth rate is negative:

g(f)<0for f>2fg(f) < 0 \quad \text{for } f > 2f^*

This means the wealth converges to zero almost surely: Wna.s.0W_n \xrightarrow{\text{a.s.}} 0.

Betting more than twice the Kelly fraction is worse than not betting at all. The growth rate g(f)g(f) is a concave parabola-like function that is positive on (0,2f)(0, 2f^*), maximized at ff^*, and negative for f>2ff > 2f^*.

Intuition

Overbetting is worse than underbetting. If you bet half of Kelly, your growth rate is 75% of optimal. If you bet twice Kelly, your growth rate is zero (you break even in the long run). If you bet more than twice Kelly, you lose money despite having a positive edge. The asymmetry between overbetting and underbetting is extreme. This is why practitioners almost universally use fractional Kelly.

Proof Sketch

For even-money bets: g(f)=plog(1+f)+(1p)log(1f)g(f) = p \log(1 + f) + (1 - p) \log(1 - f).

At f=2f=2(2p1)f = 2f^* = 2(2p - 1): compute g(2f)g(2f^*) directly and verify it equals zero by noting that the growth rate is a concave function with roots at f=0f = 0 and f=2ff = 2f^* (the second root follows from the symmetry of the log growth function around the Kelly point).

More precisely: g(0)=0g(0) = 0 and g(0)=bpq>0g'(0) = bp - q > 0 (positive edge), so gg starts positive. Since gg is strictly concave, it has exactly one other root at some f0>ff_0 > f^*. Computing this root gives f0=2ff_0 = 2f^* for even-money bets. For general payouts, the upper boundary is f0=1f_0 = 1 (you go bankrupt if you bet everything and lose).

Why It Matters

This result shows that the penalty for overbetting is severe and asymmetric. In practice, estimation error in pp means you are likely to deviate from true Kelly. Since overbetting is much more damaging than underbetting, this provides the mathematical justification for fractional Kelly: use λf\lambda f^* with λ[0.25,0.5]\lambda \in [0.25, 0.5] to build in a safety margin against parameter estimation error.

Failure Mode

The exact location of the zero-growth boundary depends on the payout structure. For non-binary bets or continuous return distributions, the boundary is not simply 2f2f^*, and must be computed from the specific distribution of returns.

Connection to Shannon and Information Theory

Kelly's 1956 paper was titled "A New Interpretation of Information Rate." The connection is direct: the maximum growth rate of wealth equals the capacity of the "side information channel."

Consider a horse race with nn horses. You observe a noisy signal YY about which horse will win. The mutual information I(X;Y)I(X; Y) between the winning horse XX and your signal YY quantifies how much your signal tells you about the outcome. Kelly showed:

g=I(X;Y)g^* = I(X; Y)

The optimal growth rate equals the mutual information. If you have no information (YY is independent of XX), g=0g^* = 0 and the optimal strategy is to not bet. If you have perfect information (H(XY)=0H(X \mid Y) = 0), g=H(X)g^* = H(X) and you can grow wealth at the entropy rate of the race.

This connection is not a coincidence. Both information theory and Kelly betting are about exploiting probabilistic asymmetries. Shannon's channel capacity tells you the maximum rate at which information can be transmitted reliably. Kelly's growth rate tells you the maximum rate at which wealth can grow reliably. The mathematics is identical.

Practical Considerations

Why Full Kelly is Too Aggressive

Full Kelly maximizes long-run growth but produces enormous volatility along the way. For even-money bets with p=0.55p = 0.55 (a 10% edge), the Kelly fraction is f=0.10f^* = 0.10. At full Kelly:

  • The standard deviation of annual returns is very large
  • Drawdowns of 50% or more are common
  • The probability of being down 50% at some point before reaching a target wealth is substantial

Half-Kelly (f=0.05f = 0.05) gives 75% of the growth rate but roughly half the volatility. Quarter-Kelly gives about 44% of the growth rate with a quarter of the volatility. Most practitioners and quantitative funds use λ[0.1,0.5]\lambda \in [0.1, 0.5].

Parameter Uncertainty

The Kelly fraction depends on knowing pp exactly. If pp is estimated from data with standard error σp\sigma_p, the estimation error in ff^* is of order σp/b\sigma_p / b. Since overbetting is more dangerous than underbetting, the correct adjustment is to shrink toward zero:

fadjusted=max(0,fcσf)f_{\text{adjusted}} = \max\left(0, \, f^* - c \cdot \sigma_{f^*}\right)

for some constant cc (typically c=1c = 1 or c=2c = 2). This is equivalent to using a conservative estimate of the edge.

Connection to Markowitz Portfolio Theory

For continuous returns with mean μ\mu and variance σ2\sigma^2, a second-order Taylor expansion of E[log(1+fR)]\mathbb{E}[\log(1 + fR)] gives:

g(f)fμf2σ22g(f) \approx f\mu - \frac{f^2 \sigma^2}{2}

Maximizing: f=μ/σ2f^* = \mu / \sigma^2. This is the Kelly fraction for continuous returns, and it equals the Sharpe ratio divided by the volatility: f=(μ/σ)/σf^* = (\mu / \sigma) / \sigma. The Kelly portfolio in the continuous case is equivalent to the Markowitz mean-variance optimal portfolio with log utility.

Canonical Examples

Example

Even-money coin flip

Coin lands heads with p=0.6p = 0.6, even-money payout (b=1b = 1).

Kelly fraction: f=2p1=0.2f^* = 2p - 1 = 0.2.

Growth rate: g(0.2)=0.6log(1.2)+0.4log(0.8)=0.6(0.1823)+0.4(0.2231)=0.10940.0892=0.0202g(0.2) = 0.6 \log(1.2) + 0.4 \log(0.8) = 0.6(0.1823) + 0.4(-0.2231) = 0.1094 - 0.0892 = 0.0202.

After n=100n = 100 bets: expected log(W100/W0)=100×0.0202=2.02\log(W_{100}/W_0) = 100 \times 0.0202 = 2.02, so expected wealth ratio is e2.027.5e^{2.02} \approx 7.5.

At half-Kelly (f=0.1f = 0.1): g(0.1)=0.6log(1.1)+0.4log(0.9)=0.05720.0421=0.0151g(0.1) = 0.6 \log(1.1) + 0.4 \log(0.9) = 0.0572 - 0.0421 = 0.0151. Growth rate is 75% of full Kelly.

At double-Kelly (f=0.4f = 0.4): g(0.4)=0.6log(1.4)+0.4log(0.6)=0.20190.2041=0.0022g(0.4) = 0.6 \log(1.4) + 0.4 \log(0.6) = 0.2019 - 0.2041 = -0.0022. Negative growth: wealth goes to zero.

Example

Kelly with 3-to-1 payout

Win probability p=0.3p = 0.3, payout b=3b = 3 (you win $3 for each $1 wagered when you win).

Expected value per bet: 3(0.3)1(0.7)=0.90.7=0.2>03(0.3) - 1(0.7) = 0.9 - 0.7 = 0.2 > 0 (positive edge).

Kelly fraction: f=(bpq)/b=(0.90.7)/3=0.2/30.0667f^* = (bp - q)/b = (0.9 - 0.7)/3 = 0.2/3 \approx 0.0667.

Despite the positive expected value, the optimal bet is only 6.7% of wealth, because the high loss probability (70%) makes larger bets too risky for long-run compounding.

Common Confusions

Watch Out

Kelly does not maximize expected wealth

Kelly maximizes expected log wealth, not expected wealth. The strategy that maximizes E[Wn]\mathbb{E}[W_n] is to bet everything (f=1f = 1) on each round (if the bet has positive expected value). But this strategy leads to ruin with probability approaching 1. The distinction between maximizing the arithmetic mean and the geometric mean is the entire point of the Kelly criterion.

Watch Out

Positive edge does not mean you should bet large

A bet with p=0.51p = 0.51 and even-money payout has a positive edge of 2%. The Kelly fraction is f=0.02f^* = 0.02. You should bet only 2% of your wealth. Beginners often confuse "positive expected value" with "bet big." The size of the optimal bet depends on the ratio of the edge to the variance, not just on the sign of the edge.

Watch Out

Kelly assumes you know the true probabilities

In practice, you estimate pp from data. If your estimate is wrong, full Kelly can be catastrophic. If you overestimate your edge by 2x, you are betting at double-Kelly, which produces zero or negative growth. This sensitivity to parameter estimates is the strongest practical argument for fractional Kelly.

Watch Out

Kelly is not a short-run strategy

Kelly is optimal in the limit nn \to \infty. For small nn, Kelly can produce large drawdowns and may be dominated by more conservative strategies depending on your risk preferences. The dominance of Kelly over all other strategies is an asymptotic result.

Exercises

ExerciseCore

Problem

A bet pays 2-to-1 (you gain $2 for each $1 wagered) with win probability p=0.4p = 0.4. Compute the Kelly fraction and the expected log growth rate per bet.

ExerciseCore

Problem

Show that for even-money bets (b=1b = 1), the Kelly fraction satisfies f=2p1f^* = 2p - 1 and that the expected log growth rate at full Kelly is g(f)=1+plog2p+(1p)log2(1p)=1H(p)g(f^*) = 1 + p \log_2 p + (1-p) \log_2(1-p) = 1 - H(p) bits per bet, where H(p)H(p) is the binary entropy.

ExerciseAdvanced

Problem

Derive the Kelly fraction for a continuous return distribution. Suppose the return per dollar invested is RR with E[R]=μ>0\mathbb{E}[R] = \mu > 0 and Var(R)=σ2\text{Var}(R) = \sigma^2. Using a second-order Taylor expansion of E[log(1+fR)]\mathbb{E}[\log(1 + fR)] around f=0f = 0, show that fμ/σ2f^* \approx \mu / \sigma^2.

ExerciseAdvanced

Problem

You estimate the win probability of a bet to be p^=0.55\hat{p} = 0.55 with standard error σp^=0.05\sigma_{\hat{p}} = 0.05. The true probability is pp (unknown). The payout is even money (b=1b = 1). Compare the expected growth rate under full Kelly using p^\hat{p} vs. half-Kelly using p^\hat{p}, accounting for the possibility that pp might be as low as p^2σ=0.45\hat{p} - 2\sigma = 0.45.

References

Canonical:

  • Kelly, "A New Interpretation of Information Rate," Bell System Technical Journal (1956)
  • Cover & Thomas, Elements of Information Theory (2nd ed., 2006), Chapter 6
  • Thorp, "The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market," in The Kelly Capital Growth Investment Criterion (2011), Chapters 1-3

Current:

  • MacLean, Thorp, & Ziemba, The Kelly Capital Growth Investment Criterion (2011), Chapters 1, 6, 9
  • Taleb, Statistical Consequences of Fat Tails (2020), Chapter 13
  • Peters & Gell-Mann, "Evaluating Gambles Using Dynamics," Chaos (2016), Section 3

Next Topics

Building on the Kelly criterion:

  • Fat tails: why full Kelly is especially dangerous when return distributions have heavy tails
  • The Kelly criterion connects to expected utility theory through the log utility function

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics