Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Algorithms Foundations

Unsolved Problems in Computer Science

P vs NP, the matrix multiplication exponent, one-way functions, the unique games conjecture, BPP vs P, natural proofs, and circuit lower bounds. The problems that define the limits of computation.

AdvancedTier 2Stable~55 min

Prerequisites

0

Why This Matters

These problems define the fundamental limits of what algorithms can do efficiently. Many ML problems (optimal feature selection, optimal neural architecture, clustering) are NP-hard. Understanding the landscape of unsolved problems tells you which theoretical improvements are possible and which are likely blocked by deep computational barriers.

If P = NP, then every problem whose solution can be verified quickly can also be solved quickly. Cryptography breaks. Optimal learning becomes tractable. The entire landscape of computation changes. If P != NP (as almost everyone believes), then hardness is real and we must live with approximations, heuristics, and good-enough solutions.

Problem 1: P vs NP

Status: Open since 1971 (Cook). Millennium Prize Problem ($1,000,000 reward).

P is the class of problems solvable in polynomial time. NP is the class of problems where a solution can be verified in polynomial time. The question: is P = NP?

What we know: Thousands of NP-complete problems are known (SAT, traveling salesman, graph coloring, subset sum). No polynomial-time algorithm has been found for any of them. No proof of P != NP has been found either.

Connection to ML: Optimal feature selection (choose the kk features that minimize prediction error) is NP-hard. Optimal clustering (minimize within-cluster sum of squares) is NP-hard. Finding the smallest neural network consistent with training data is NP-hard. If P = NP, all of these become tractable. Since we believe P != NP, we accept approximate solutions.

Problem 2: Matrix Multiplication Exponent

Status: Open since 1969. See the dedicated page on matrix multiplication algorithms.

Is ω=2\omega = 2? The best upper bound is ω2.371552\omega \leq 2.371552 (Alman-Williams). The lower bound is ω2\omega \geq 2.

Connection to ML: Every matrix multiplication in training and inference. A lower exponent means faster everything. See the matrix multiplication algorithms topic for details.

Problem 3: One-Way Functions

Status: Open. Equivalent to P != NP in certain formulations.

Definition

One-Way Function

A function ff is one-way if it is computable in polynomial time but for every probabilistic polynomial-time algorithm AA:

Pr[f(A(f(x)))=f(x)]negl(n)\Pr[f(A(f(x))) = f(x)] \leq \text{negl}(n)

for random xx of length nn. Informally: easy to compute, hard to invert.

What we know: Candidate one-way functions exist (integer factoring, discrete logarithm). No proof that any specific function is one-way. The existence of one-way functions is equivalent to the existence of secure pseudorandom generators, which is implied by (but may be weaker than) P != NP.

Connection to ML: Cryptographic security underlies secure computation on private data (federated learning, differential privacy implementations). Pseudorandom generators are used in randomized algorithms throughout ML. If one-way functions do not exist, all of cryptography collapses.

Problem 4: The Unique Games Conjecture

Status: Open since 2002 (Khot).

Definition

Unique Games Conjecture

The unique games conjecture (UGC) states that for every ϵ>0\epsilon > 0, it is NP-hard to determine whether a unique game instance has value at least 1ϵ1 - \epsilon or at most ϵ\epsilon.

A unique game is a constraint satisfaction problem where each constraint involves two variables and is a bijection: given the value of one variable, the other is uniquely determined.

What it implies: If UGC is true, then many approximation algorithms are optimal. Specifically:

  • The Goemans-Williamson 0.878-approximation for MAX-CUT is optimal
  • Many semidefinite programming relaxations are the best possible polynomial-time algorithms
  • Certain inapproximability results for k-means clustering and other optimization problems follow

Connection to ML: The UGC would imply that certain ML optimization problems (clustering, graph partitioning, constraint satisfaction) cannot be approximated better than current algorithms in polynomial time. This would formalize the belief that heuristics are necessary.

Problem 5: BPP = P?

Status: Open, but widely believed to be true.

Definition

BPP

BPP (Bounded-error Probabilistic Polynomial time) is the class of problems solvable in polynomial time by a randomized algorithm that errs with probability at most 1/31/3 on every input.

The question: does randomness help for efficient computation? Is BPP = P?

What we know: Under standard derandomization conjectures (hardness of certain problems for small circuits), BPP = P. In practice, many randomized algorithms have been derandomized (e.g., primality testing: AKS is deterministic and polynomial, matching the randomized Miller-Rabin).

Connection to ML: Many ML algorithms are inherently randomized (SGD, random initialization, dropout, random features). If BPP = P, then in principle, deterministic versions of these algorithms exist with comparable performance. In practice, randomness is used not because it is necessary for efficiency but because it simplifies algorithm design and implementation.

Problem 6: The Natural Proofs Barrier

Theorem

Razborov-Rudich Natural Proofs Barrier

Statement

If one-way functions exist, then no "natural" proof technique can prove super-polynomial circuit lower bounds for explicit Boolean functions. A proof is "natural" if it works by identifying a combinatorial property that (a) is useful (separates easy from hard functions) and (b) is large (satisfied by a random function with high probability).

Intuition

Most known lower bound proofs in circuit complexity are natural in this sense. The barrier says: if cryptography is secure, then the standard toolbox for proving lower bounds is insufficient to resolve P vs NP or related questions. The properties that distinguish easy functions from hard ones are themselves hard to detect if one-way functions exist.

Proof Sketch

If a property PP is large (satisfied by most functions) and useful (not satisfied by any function in the target complexity class), then PP would distinguish random functions from pseudorandom functions. But pseudorandom generators (which exist if one-way functions exist) produce outputs that cannot be efficiently distinguished from random. So no efficient algorithm can detect property PP, and the "constructive" nature of the lower bound proof would provide such an algorithm. Contradiction.

Why It Matters

The natural proofs barrier, along with relativization (Baker-Gill-Solovay, 1975) and algebrization (Aaronson-Wigderson, 2009), explains why P vs NP has resisted proof for 50+ years. Any resolution must use non-natural, non-relativizing, non-algebrizing techniques.

Failure Mode

The barrier only applies to "natural" proofs. Non-natural approaches (e.g., Ryan Williams' connection between algorithms and circuit lower bounds, or geometric complexity theory) are not blocked. The barrier is a statement about a class of proof strategies, not an impossibility result for all strategies.

Problem 7: Circuit Lower Bounds

Status: Very limited progress despite decades of effort.

The goal: prove that specific problems (like SAT) require circuits of super-polynomial size. This would imply P != NP (since polynomial-time algorithms correspond to polynomial-size circuits in a uniform model).

What we know: Super-polynomial lower bounds exist for restricted circuit classes:

  • AC0\text{AC}^0 lower bounds: PARITY requires exponential-size constant-depth circuits (Furst-Saxe-Sipser, Hastad)
  • Monotone circuit lower bounds: the CLIQUE function requires exponential-size monotone circuits (Razborov)
  • For general circuits: no super-linear lower bound is known for any explicit function in NP

Connection to ML: Neural networks are circuits. Understanding circuit lower bounds tells us about the inherent limitations of finite-depth, finite-width networks. The fact that no strong circuit lower bounds exist for general circuits parallels the fact that we lack a complete theory of when neural networks can and cannot learn efficiently.

Impagliazzo's Five Worlds

Proposition

Impagliazzos Five Worlds

Statement

Impagliazzo (1995) described five possible worlds based on which complexity assumptions hold:

  1. Algorithmica: P = NP. All NP problems are easy.
  2. Heuristica: P != NP, but NP problems are easy on average (hard instances are rare).
  3. Pessiland: Average-case hard problems exist, but one-way functions do not.
  4. Minicrypt: One-way functions exist, but public-key cryptography does not.
  5. Cryptomania: Public-key cryptography exists (the world we believe we live in).

Intuition

Each world has different implications for what algorithms can achieve. We believe we live in Cryptomania (public-key crypto works), which implies P != NP, one-way functions exist, and hard-on-average problems exist.

Proof Sketch

This is a classification, not a theorem. Each world is defined by which complexity-theoretic assumptions hold. The implications between assumptions (e.g., public-key crypto implies one-way functions implies P != NP) define the ordering.

Why It Matters

The framework clarifies what is at stake in complexity theory. For ML: if we live in Algorithmica, then optimal learning is always tractable. In Cryptomania, computational hardness is real and fundamental, and we must design algorithms that work well despite not solving NP-hard problems optimally.

Failure Mode

The five worlds framework is a simplification. Reality could involve combinations or intermediate scenarios not cleanly captured by these categories. Also, the framework describes worst-case or average-case complexity; practical ML operates in regimes where the "typical" case may be much easier than the worst case.

Common Confusions

Watch Out

NP-hard does not mean impossible in practice

Many NP-hard problems are solved routinely for practical instances. SAT solvers handle millions of variables. Integer programming solvers solve industrial scheduling problems. NP-hardness is a worst-case statement: there exist instances that are hard. But the instances arising in practice often have structure that makes them tractable.

Watch Out

P vs NP is about worst-case complexity

If P != NP, it means there exist hard instances of NP-complete problems. It does not mean every instance is hard. Most ML-relevant optimization problems are NP-hard in theory but solvable in practice via SGD, greedy methods, or relaxations. The gap between worst-case theory and average-case practice is enormous.

Watch Out

BPP = P does not mean randomness is useless

Even if BPP = P, randomized algorithms may still be simpler to design, implement, and analyze. The derandomized versions of algorithms are often more complex with larger constants. Randomness is a tool for simplicity even when it is not needed for computational power.

Key Takeaways

  • P vs NP: the central open question in computer science. If P = NP, optimal learning is tractable.
  • One-way functions: their existence underpins cryptography and is closely tied to P != NP
  • Unique games conjecture: would prove optimality of many approximation algorithms used in ML
  • BPP = P: widely believed; randomness probably does not add computational power
  • Natural proofs barrier: explains why P vs NP resists current proof techniques
  • Circuit lower bounds: almost no progress for general circuits despite decades of effort
  • We believe we live in Cryptomania: hardness is real, and ML must work within its constraints

Exercises

ExerciseCore

Problem

Explain why the following ML problems are NP-hard, and what practical algorithms are used instead: (a) optimal kk-feature selection, (b) minimum-size neural network consistent with training data.

ExerciseAdvanced

Problem

The natural proofs barrier says that natural proof techniques cannot prove circuit lower bounds if one-way functions exist. Does this mean P vs NP can never be resolved?

References

Canonical:

  • Arora & Barak, Computational Complexity: A Modern Approach (2009), Chapters 2-6, 9, 14, 23
  • Razborov & Rudich, "Natural Proofs" (1997), JCSS

Current:

  • Impagliazzo, "A Personal View of Average-Case Complexity" (1995)
  • Khot, "On the Power of Unique 2-Prover 1-Round Games" (2002)
  • Aaronson, "P =? NP" survey (2017), for an accessible overview

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics