Unsolved Problems in Computer Science

Sneiderman, Robby

Algorithms Foundations

Unsolved Problems in Computer Science

P vs NP, the matrix multiplication exponent, one-way functions, the unique games conjecture, BPP vs P, natural proofs, and circuit lower bounds. The problems that define the limits of computation.

AdvancedTier 2StableSupporting~55 min

Prerequisites

P vs Np Open Problems in Matrix Computation

Prereq Map

Learning position

Read this page in the graph.

algorithms-foundations | layer 3 | tier 2. This page has 1 direct prerequisite and 1 published dependent.

Open Atlas Prerequisites Leads to

What next

Open Problems in ML Theory

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

These problems define the fundamental limits of what algorithms can do efficiently. Many ML problems (optimal feature selection, optimal neural architecture, clustering) are NP-hard. Understanding the landscape of unsolved problems tells you which theoretical improvements are possible and which are likely blocked by deep computational barriers.

If P = NP, then every problem whose solution can be verified quickly can also be solved quickly. Cryptography breaks. Optimal learning becomes tractable. The entire landscape of computation changes. If P != NP (as almost everyone believes), then hardness is real and we must live with approximations, heuristics, and good-enough solutions.

Problem 1: P vs NP

Status: Open since 1971 (Cook). Millennium Prize Problem (1,000,000 USD reward).

P is the class of problems solvable in polynomial time. NP is the class of problems where a solution can be verified in polynomial time. The question: is P = NP?

What we know: Thousands of NP-complete problems are known (SAT, traveling salesman, graph coloring, subset sum). No polynomial-time algorithm has been found for any of them. No proof of P != NP has been found either.

Connection to ML: Optimal feature selection (choose the $k$ features that minimize prediction error) is NP-hard. Optimal clustering (minimize within-cluster sum of squares) is NP-hard. Finding the smallest neural network consistent with training data is NP-hard. If P = NP, all of these become tractable. Since we believe P != NP, we accept approximate solutions.

Problem 2: Matrix Multiplication Exponent

Status: Open since 1969. See the dedicated page on matrix multiplication algorithms.

Is $\omega = 2$ ? The best upper bound is $\omega \leq 2.371552$ (Alman-Williams). The lower bound is $\omega \geq 2$ .

Connection to ML: Every matrix multiplication in training and inference. A lower exponent means faster everything. See the matrix multiplication algorithms topic for details.

Problem 3: One-Way Functions

Status: Open. Equivalent to P != NP in certain formulations.

Definition

One-Way Function

A function $f$ is one-way if it is computable in polynomial time but for every probabilistic polynomial-time algorithm $A$ :

$\Pr[f(A(f(x))) = f(x)] \leq \text{negl}(n)$

for random $x$ of length $n$ . Informally: easy to compute, hard to invert.

What we know: Candidate one-way functions exist (integer factoring, discrete logarithm). No proof that any specific function is one-way. The existence of one-way functions is equivalent to the existence of secure pseudorandom generators, which is implied by (but may be weaker than) P != NP.

Connection to ML: Cryptographic security underlies secure computation on private data (federated learning, differential privacy implementations). Pseudorandom generators are used in randomized algorithms throughout ML. If one-way functions do not exist, all of cryptography collapses.

Problem 4: The Unique Games Conjecture

Status: Open since 2002 (Khot).

Definition

Unique Games Conjecture

The unique games conjecture (UGC) states that for every $\epsilon > 0$ , it is NP-hard to determine whether a unique game instance has value at least $1 - \epsilon$ or at most $\epsilon$ .

A unique game is a constraint satisfaction problem where each constraint involves two variables and is a bijection: given the value of one variable, the other is uniquely determined.

What it implies: If UGC is true, then many approximation algorithms are optimal. Specifically:

The Goemans-Williamson 0.878-approximation for MAX-CUT is optimal
Many semidefinite programming relaxations are the best possible polynomial-time algorithms
Certain inapproximability results for k-means clustering and other optimization problems follow

Connection to ML: The UGC would imply that certain ML optimization problems (clustering, graph partitioning, constraint satisfaction) cannot be approximated better than current algorithms in polynomial time. This would formalize the belief that heuristics are necessary.

Problem 5: BPP = P?

Status: Open, but widely believed to be true.

Definition

BPP

BPP (Bounded-error Probabilistic Polynomial time) is the class of problems solvable in polynomial time by a randomized algorithm that errs with probability at most $1/3$ on every input.

The question: does randomness help for efficient computation? Is BPP = P?

What we know: Under standard derandomization conjectures (hardness of certain problems for small circuits), BPP = P. In practice, many randomized algorithms have been derandomized (e.g., primality testing: AKS is deterministic and polynomial, matching the randomized Miller-Rabin).

Connection to ML: Many ML algorithms are inherently randomized (SGD, random initialization, dropout, random features). If BPP = P, then in principle, deterministic versions of these algorithms exist with comparable performance. In practice, randomness is used not because it is necessary for efficiency but because it simplifies algorithm design and implementation.

Problem 6: The Natural Proofs Barrier

Theorem

Razborov-Rudich Natural Proofs Barrier

Statement

If one-way functions exist, then no "natural" proof technique can prove super-polynomial circuit lower bounds for explicit Boolean functions. A proof is "natural" if it works by identifying a combinatorial property that (a) is useful (separates easy from hard functions) and (b) is large (satisfied by a random function with high probability).

Intuition

Most known lower bound proofs in circuit complexity are natural in this sense. The barrier says: if cryptography is secure, then the standard toolbox for proving lower bounds is insufficient to resolve P vs NP or related questions. The properties that distinguish easy functions from hard ones are themselves hard to detect if one-way functions exist.

Proof Sketch

If a property $P$ is large (satisfied by most functions) and useful (not satisfied by any function in the target complexity class), then $P$ would distinguish random functions from pseudorandom functions. But pseudorandom generators (which exist if one-way functions exist) produce outputs that cannot be efficiently distinguished from random. So no efficient algorithm can detect property $P$ , and the "constructive" nature of the lower bound proof would provide such an algorithm. Contradiction.

Why It Matters

The natural proofs barrier, along with relativization (Baker-Gill-Solovay, 1975) and algebrization (Aaronson-Wigderson, 2009), explains why P vs NP has resisted proof for 50+ years. Any resolution must use non-natural, non-relativizing, non-algebrizing techniques.

Failure Mode

The barrier only applies to "natural" proofs. Non-natural approaches (e.g., Ryan Williams' connection between algorithms and circuit lower bounds, or geometric complexity theory) are not blocked. The barrier is a statement about a class of proof strategies, not an impossibility result for all strategies.

report a correction →

Problem 7: Circuit Lower Bounds

Status: Very limited progress despite decades of effort.

The goal: prove that specific problems (like SAT) require circuits of super-polynomial size. This would imply P != NP (since polynomial-time algorithms correspond to polynomial-size circuits in a uniform model).

What we know: Super-polynomial lower bounds exist for restricted circuit classes:

$\text{AC}^0$ lower bounds: PARITY requires exponential-size constant-depth circuits (Furst-Saxe-Sipser, Hastad)
Monotone circuit lower bounds: the CLIQUE function requires exponential-size monotone circuits (Razborov)
$\text{ACC}^0$ lower bounds: Williams (2011) proved $\text{NEXP} \not\subseteq \text{ACC}^0$ , the first super-polynomial lower bound for $\text{ACC}^0$ circuits, via a non-natural argument that connected fast SAT algorithms to circuit lower bounds.
For general circuits: no super-linear lower bound is known for any explicit function in NP

The algorithms-to-lower-bounds connection: Kabanets and Impagliazzo (2004) showed that derandomizing polynomial identity testing implies super-polynomial circuit lower bounds for $\text{NEXP}$ . Williams' subsequent program turned this into a working method: any non-trivial improvement over brute force for circuit-SAT on a circuit class $C$ yields lower bounds against $C$ . This is the main non-natural technique known to work, and it sidesteps the Razborov-Rudich barrier.

Connection to ML: Neural networks are circuits. Understanding circuit lower bounds tells us about the inherent limitations of finite-depth, finite-width networks. The fact that no strong circuit lower bounds exist for general circuits parallels the fact that we lack a complete theory of when neural networks can and cannot learn efficiently.

Impagliazzo's Five Worlds

Proposition

Impagliazzos Five Worlds

Statement

Impagliazzo (1995) described five possible worlds based on which complexity assumptions hold:

Algorithmica: P = NP. All NP problems are easy.
Heuristica: P != NP, but NP problems are easy on average (hard instances are rare).
Pessiland: Average-case hard problems exist, but one-way functions do not.
Minicrypt: One-way functions exist, but public-key cryptography does not.
Cryptomania: Public-key cryptography exists (the world we believe we live in).

Intuition

Each world has different implications for what algorithms can achieve. We believe we live in Cryptomania (public-key crypto works), which implies P != NP, one-way functions exist, and hard-on-average problems exist.

Proof Sketch

This is a classification, not a theorem. Each world is defined by which complexity-theoretic assumptions hold. The implications between assumptions (e.g., public-key crypto implies one-way functions implies P != NP) define the ordering.

Why It Matters

The framework clarifies what is at stake in complexity theory. For ML: if we live in Algorithmica, then optimal learning is always tractable. In Cryptomania, computational hardness is real and fundamental, and we must design algorithms that work well despite not solving NP-hard problems optimally.

Failure Mode

The five worlds framework is a simplification. Reality could involve combinations or intermediate scenarios not cleanly captured by these categories. Also, the framework describes worst-case or average-case complexity; practical ML operates in regimes where the "typical" case may be much easier than the worst case.

report a correction →

Common Confusions

Watch Out

NP-hard does not mean impossible in practice

Many NP-hard problems are solved routinely for practical instances. SAT solvers handle millions of variables. Integer programming solvers solve industrial scheduling problems. NP-hardness is a worst-case statement: there exist instances that are hard. But the instances arising in practice often have structure that makes them tractable.

Watch Out

P vs NP is about worst-case complexity

If P != NP, it means there exist hard instances of NP-complete problems. It does not mean every instance is hard. Most ML-relevant optimization problems are NP-hard in theory but solvable in practice via SGD, greedy methods, or relaxations. The gap between worst-case theory and average-case practice is enormous.

Watch Out

BPP = P does not mean randomness is useless

Even if BPP = P, randomized algorithms may still be simpler to design, implement, and analyze. The derandomized versions of algorithms are often more complex with larger constants. Randomness is a tool for simplicity even when it is not needed for computational power.

Summary

P vs NP: the central open question in computer science. If P = NP, optimal learning is tractable.
One-way functions: their existence underpins cryptography and is closely tied to P != NP
Unique games conjecture: would prove optimality of many approximation algorithms used in ML
BPP = P: widely believed; randomness probably does not add computational power
Natural proofs barrier: explains why P vs NP resists current proof techniques
Circuit lower bounds: almost no progress for general circuits despite decades of effort
We believe we live in Cryptomania: hardness is real, and ML must work within its constraints

Exercises

ExerciseCore

Problem

Explain why the following ML problems are NP-hard, and what practical algorithms are used instead: (a) optimal $k$ -feature selection, (b) minimum-size neural network consistent with training data.

ExerciseAdvanced

Problem

The natural proofs barrier says that natural proof techniques cannot prove circuit lower bounds if one-way functions exist. Does this mean P vs NP can never be resolved?

References

Canonical:

Arora & Barak, Computational Complexity: A Modern Approach (2009), Chapters 2-6, 9, 14, 23
Razborov & Rudich, "Natural Proofs" (1997), JCSS

Current:

Impagliazzo, "A Personal View of Average-Case Complexity" (1995)
Khot, "On the Power of Unique 2-Prover 1-Round Games" (2002)
Aaronson, "P =? NP" survey (2017), for an accessible overview
Williams, "Non-Uniform $\text{ACC}$ Circuit Lower Bounds" (J. ACM 2014, conference version FOCS 2011). The first super-polynomial lower bound against $\text{ACC}^0$ via the SAT-algorithms-imply-lower-bounds method.
Kabanets and Impagliazzo, "Derandomizing Polynomial Identity Tests Means Proving Circuit Lower Bounds" (Computational Complexity 2004). The formal link between fast PIT algorithms and circuit lower bounds for $\text{NEXP}$ .

Next Topics

Open problems in ML theory: the unsolved theoretical questions specific to machine learning

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Open Problems in Matrix Computationlayer 3 · tier 3

Derived topics

1

Open Problems in ML Theorylayer 5 · tier 3

Graph-backed continuations

Open Problems in ML Theory