Benford's Law

Sneiderman, Robby

Foundations

Benford's Law

The leading digit of naturally occurring numerical data is not uniformly distributed: digit 1 appears about 30% of the time, digit 9 about 5%. This arises from scale invariance and logarithmic density, and has real applications in fraud detection, election auditing, and data integrity checks.

CoreTier 2StableInsight~40 min

Prerequisites

Common Probability Distributions

Start 8-question practice · 3 available Prereq Map

Learning position

Read this page in the graph.

foundations | layer 1 | tier 2. This page has 1 direct prerequisite and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Hypothesis Testing for ML

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Pick any table of naturally occurring numbers: city populations, river lengths, stock prices, physical constants, tax returns. Count how often each leading digit (1 through 9) appears. You might expect each digit to appear about 11% of the time. Instead, digit 1 appears about 30% of the time, digit 2 about 17%, and digit 9 only about 5%.

This is Benford's law, and it is not a curiosity. It is used in forensic accounting to detect fabricated financial data, in election auditing to flag anomalous vote counts, and in scientific peer review to catch manipulated datasets. Understanding why it holds (and when it fails) requires thinking carefully about what "naturally occurring" means.

The Law

Proposition

Benfords Law

Statement

The probability that the leading digit $d$ (for $d \in \{1, 2, \ldots, 9\}$ ) of a naturally occurring number equals $d$ is:

$P(\text{leading digit} = d) = \log_{10}\!\left(1 + \frac{1}{d}\right)$

This gives:

Digit	Probability
1	30.1%
2	17.6%
3	12.5%
4	9.7%
5	7.9%
6	6.7%
7	5.8%
8	5.1%
9	4.6%

More generally, for the first two digits $d_1 d_2$ (where $d_1 d_2$ ranges from 10 to 99):

$P(\text{first two digits} = d_1 d_2) = \log_{10}\!\left(1 + \frac{1}{d_1 d_2}\right)$

Intuition

On a logarithmic scale, the interval from 1 to 2 is the same width as the interval from 2 to 4, or from 5 to 10. The interval $[1, 2)$ covers $\log_{10}(2) - \log_{10}(1) = 0.301$ of the unit interval on the log scale. Numbers whose leading digit is 1 correspond to this interval (and its translations by integer powers of 10). Since 0.301 is the largest such interval, digit 1 appears most often. Each successive digit gets a smaller slice of the logarithmic scale.

Why It Matters

Benford's law is one of the cleanest examples of a distributional regularity that arises from structural properties of the data (scale invariance) rather than from any specific generative mechanism. It connects to information theory (the Benford distribution maximizes entropy subject to scale-invariance constraints), to hypothesis testing (you can test whether data conforms to Benford's law using chi-squared or KS tests), and to practical fraud detection.

Failure Mode

Benford's law fails when the data does not span multiple orders of magnitude. Human heights (roughly 150 to 200 cm) will not follow the law because the leading digit is always 1 or 2. Lottery numbers, telephone numbers, and data sampled from a uniform distribution on a narrow range will not follow it either. The law requires the data to be "scale-rich": drawn from a process where the log of the values is spread broadly.

report a correction →

Why Does It Work?

Theorem

Scale Invariance Characterization

Statement

If a random variable $X > 0$ has the property that the leading-digit distribution of $cX$ is the same as that of $X$ for all $c > 0$ , then the leading digits of $X$ follow Benford's law.

Equivalently, $\log_{10}(X) \mod 1$ is uniformly distributed on $[0, 1)$ .

Intuition

If the leading-digit distribution is unchanged by multiplying all values by any constant (changing the currency, the units, the scale), then the only possible distribution is the Benford distribution. This is because multiplication by $c$ shifts $\log_{10}(X)$ by $\log_{10}(c)$ , and the only distribution on $[0, 1)$ that is invariant under arbitrary shifts (modulo 1) is the uniform distribution.

Why It Matters

This explains why Benford's law appears in so many unrelated datasets. Any data-generating process that is independent of the unit of measurement (populations don't "know" whether they're measured in ones, thousands, or millions) will tend to produce Benford-distributed leading digits. The law is a consequence of scale invariance, which is a property of the measurement process, not of the specific phenomenon being measured.

Failure Mode

Strict scale invariance is impossible for a distribution with bounded support. Real data only approximately satisfies it. Data that spans 3 or more orders of magnitude typically shows good agreement with Benford's law. Data spanning less than 2 orders of magnitude typically does not.

report a correction →

Alternative derivation. Mixtures of uniform distributions converge to Benford's law. If you take many independent uniform distributions on intervals $[0, N_i]$ for varying $N_i$ and mix them, the leading-digit distribution of the mixture approaches the Benford distribution as the number of component distributions grows. This is a form of the central limit theorem for significant digits (Hill, 1995).

Applications

Fraud detection in accounting. If someone fabricates financial numbers, they typically choose digits roughly uniformly (each digit appears 11% of the time). Real financial data follows Benford's law. A goodness-of-fit test comparing the observed digit distribution to the Benford distribution can flag suspicious datasets. The IRS and forensic accountants use this technique routinely.

Election auditing. Precinct-level vote counts in legitimate elections tend to follow Benford's law for the second digit (the first digit depends on precinct size distribution). Deviations can indicate data irregularities, though not necessarily fraud (the method produces false positives and must be used alongside other checks).

Scientific data integrity. Fabricated experimental data often violates Benford's law because researchers choose "plausible" numbers without realizing the leading-digit distribution should be non-uniform. Several retracted papers have been flagged by Benford's law analysis.

Common Confusions

Watch Out

Benfords law is not about randomness

Benford's law describes a regularity in deterministic datasets, not just random ones. The populations of all countries, the areas of all lakes, the Fibonacci numbers: all follow Benford's law. The "probability" in the law statement refers to the frequency across a dataset, not to a stochastic generating process.

Watch Out

Violating Benfords law does not prove fraud

Benford's law provides a necessary condition for "natural" data, not a sufficient condition for fraud. Many legitimate datasets violate it (bounded data, narrow ranges, assigned numbers). And some fraudulent data accidentally conforms to it. Benford analysis is a screening tool, not a proof of wrongdoing.

Watch Out

The law applies to leading digits, not all digits

The non-uniformity is strongest for the first digit and progressively weaker for later digits. By the fourth or fifth digit, the distribution is nearly uniform. This is because the logarithmic spacing effect diminishes as you move to finer granularity.

Exercises

ExerciseCore

Problem

The Fibonacci sequence begins 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ... Count the leading digits of the first 20 Fibonacci numbers. How does the distribution compare to Benford's law?

ExerciseAdvanced

Problem

Prove that if $\log_{10}(X) \mod 1 \sim \text{Uniform}(0, 1)$ , then the leading digit of $X$ follows Benford's law.

References

Canonical:

Benford, "The Law of Anomalous Numbers" (Proceedings of the American Philosophical Society, 1938)
Hill, "A Statistical Derivation of the Significant-Digit Law" (Statistical Science, 1995). The modern mathematical foundation.

Applications:

Nigrini, Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection (2012). The practical reference.
Miller (ed.), Benford's Law: Theory and Applications (2015). Comprehensive mathematical and applied treatment.

Next Topics

Hypothesis testing: the framework for testing whether data conforms to a theoretical distribution
Goodness-of-fit tests: KS and chi-squared tests to detect departures from Benford's law

Last reviewed: April 14, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Common Probability Distributionslayer 0A · tier 1

Derived topics

2

Goodness-of-Fit Testslayer 1 · tier 2
Hypothesis Testing for MLlayer 2 · tier 2

Graph-backed continuations

Hypothesis Testing for ML Goodness-of-Fit Tests