Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Number Theory ML

Number Theory and Machine Learning

The emerging two-way street between number theory and machine learning: how number-theoretic tools improve ML systems, and how ML is discovering new mathematical structure in classical problems.

AdvancedTier 3Frontier~60 min
0

Why This Matters

Number theory and machine learning are interacting in ways nobody predicted a decade ago. The interaction runs in both directions: number-theoretic tools (quasi-random sequences, lattice methods, algebraic structure) are improving ML systems, while ML is being used as a discovery tool in pure mathematics, finding patterns in L-functions, elliptic curves, and prime distributions that humans had missed.

This page maps both directions honestly, including the significant limitations of current approaches.

Number Theory Aiding ML

Cryptography and Privacy

Modern ML increasingly requires privacy. Homomorphic encryption, secure multi-party computation, and differential privacy all rest on number-theoretic hardness assumptions.

Definition

Lattice-Based Cryptography

Cryptographic schemes whose security reduces to the hardness of lattice problems such as the Shortest Vector Problem (SVP) or Learning With Errors (LWE). These are the foundation of post-quantum cryptography and enable homomorphic encryption for private ML inference.

The Learning With Errors (LWE) problem, introduced by Regev, is a noisy linear algebra problem over finite fields. Its hardness enables encrypted computation on ML models without revealing the input data.

Quasi-Random Sequences

Standard pseudo-random numbers can cluster, leaving gaps in high-dimensional spaces. Quasi-random (low-discrepancy) sequences fill space more uniformly, improving Monte Carlo integration, hyperparameter search, and initialization. This connects to importance sampling and variance reduction in estimation.

Definition

Discrepancy

The discrepancy of a point set {x1,,xN}[0,1]d\{x_1, \ldots, x_N\} \subset [0,1]^d measures the worst-case deviation between the empirical distribution and the uniform distribution:

DN=supBB{xiB}Nvol(B)D_N = \sup_{B \in \mathcal{B}} \left| \frac{|\{x_i \in B\}|}{N} - \text{vol}(B) \right|

Lower discrepancy means more uniform coverage.

Theorem

Koksma-Hlawka Inequality

Statement

For a function ff of bounded variation V(f)V(f) on [0,1]d[0,1]^d and a point set with discrepancy DND_N:

1Ni=1Nf(xi)[0,1]df(x)dxV(f)DN\left| \frac{1}{N} \sum_{i=1}^{N} f(x_i) - \int_{[0,1]^d} f(x) \, dx \right| \leq V(f) \cdot D_N

Intuition

The integration error is controlled by two factors: how rough the function is (V(f)V(f)) and how uniformly the points cover the domain (DND_N). Halton, Sobol, and other quasi-random sequences achieve DN=O((logN)d/N)D_N = O((\log N)^d / N), which beats the O(1/N)O(1/\sqrt{N}) rate of random Monte Carlo.

Proof Sketch

Decompose the integration error using a multidimensional integration by parts (the Koksma-Hlawka identity). Each term in the decomposition is bounded by the variation of ff times the discrepancy of the projection of the point set onto the corresponding coordinate subspace.

Why It Matters

This is why quasi-random hyperparameter search (e.g., Sobol sequences) can outperform grid search and random search, especially in moderate dimensions. The theoretical rate O((logN)d/N)O((\log N)^d / N) is much better than random sampling's O(1/N)O(1/\sqrt{N}) when dd is not too large.

Failure Mode

In very high dimensions, the (logN)d(\log N)^d factor can dominate, and the advantage of quasi-random sequences over pseudorandom ones diminishes. Above roughly d=20d = 20, the practical benefit is marginal.

Input Representation

Positional encodings in transformers use ideas from number theory. The original sinusoidal encoding in the transformer architecture can be viewed through the lens of characters of cyclic groups. More recent work explores representations based on the Chinese Remainder Theorem for encoding positions and numerical values.

ML Aiding Number Theory

The LMFDB and Data-Driven Mathematics

The L-functions and Modular Forms Database (LMFDB) contains billions of mathematical objects: elliptic curves, modular forms, L-functions, number fields. This is a natural target for ML pattern recognition.

Example

ML and the BSD Conjecture

The Birch and Swinnerton-Dyer (BSD) conjecture relates the rank of an elliptic curve to the behavior of its L-function at s=1s = 1. ML models trained on LMFDB data have been used to predict the rank of elliptic curves from their conductors and other invariants. While these models do not prove BSD, they help identify which curves deserve closer study and reveal correlations between invariants that suggest new conjectures.

Murmurations: A Genuine Discovery

Example

Murmurations in Elliptic Curves

In 2022-2023, He, Lee, Oliver, and Pozdnyakov used ML to discover murmurations: unexpected oscillatory patterns in the average values of apa_p coefficients of elliptic curves when sorted by conductor. The patterns, resembling starling murmurations, were not predicted by any existing theory.

Crucially, after ML flagged the patterns, mathematicians proved they were real using analytic number theory (trace formulas and explicit formulas for L-functions). This is the ideal ML-for-math workflow: ML discovers, humans prove.

Prime Gap Patterns

ML models have been trained to predict the distribution of prime gaps, and reinforcement learning has been applied to optimize sieve methods. However, the results here are more modest: models tend to rediscover known heuristics (the Cramér model, the Hardy-Littlewood conjectures) rather than finding genuinely new structure.

Honest Limitations

Watch Out

ML often finds trivial number-theoretic patterns

When you train a neural network to predict whether a number is prime, it typically learns to check divisibility by small primes (2, 3, 5, 7, ...). This is not deep mathematics; it is pattern matching on the least significant bits. Similarly, models trained to predict prime gaps often learn the residue class structure modulo small primes rather than anything about the deep distribution of primes.

The lesson: always check whether the ML model has learned something genuinely new or merely rediscovered elementary divisibility rules.

Watch Out

ML predictions are not proofs

An ML model predicting the rank of an elliptic curve with 95% accuracy is useful for directing research but proves nothing. In number theory, a conjecture supported by a billion numerical examples but lacking a proof is still a conjecture. The standards of the two fields are incompatibly different, and conflating them helps neither.

Summary

  • Number theory provides tools for private ML (lattice crypto, LWE, homomorphic encryption)
  • Quasi-random sequences beat random sampling for integration and search in moderate dimensions
  • ML has genuinely discovered new mathematical structure (murmurations) that was subsequently proven
  • But ML also frequently learns trivial patterns (small-prime divisibility) and mistakes noise for signal
  • The ideal workflow: ML discovers, humans verify and prove

Exercises

ExerciseCore

Problem

Explain why a Sobol sequence with NN points in [0,1]2[0,1]^2 gives a better estimate of [0,1]2f(x,y)dxdy\int_{[0,1]^2} f(x,y) \, dx \, dy than NN uniformly random points, for a smooth function ff. What is the convergence rate for each?

ExerciseAdvanced

Problem

A neural network trained to classify numbers as prime or composite achieves 92% accuracy on 6-digit numbers. Before being impressed, what simple baseline should you compare against, and why might the network not have learned anything deep?

ExerciseAdvanced

Problem

Why is the Learning With Errors (LWE) problem relevant to private ML inference? Sketch how LWE enables computing on encrypted data.

References

Murmurations:

  • He, Lee, Oliver, Pozdnyakov, "Murmurations of elliptic curves" (2023)
  • Sutherland, Zywina, "Murmurations of elliptic curves: analytic proof" (2024)

Quasi-Random Methods:

  • Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods (1992)

ML for Number Theory:

  • Davies, Juhasz, Sherrill, et al., "Advancing mathematics by guiding human intuition with AI," Nature (2021)

LWE and Crypto:

  • Regev, "On lattices, learning with errors, random linear codes, and cryptography" (2005)

Next Topics

This is a frontier survey topic. Explore individual directions as they develop in the research literature.

Last reviewed: April 2026