Skip to main content

Computability · 50 min

Computability Theory

What can be computed? Turing machines, decidability, the Church-Turing thesis, recursive and recursively enumerable sets, reductions, Rice's theorem, and connections to learning theory.

Why This Matters

Before asking whether a problem is efficient, you must ask whether it is solvable at all. Computability theory draws the boundary between problems that any algorithm can solve and problems that no algorithm can solve, regardless of time or space.

This is not abstract philosophy. In ML, certain concept classes are provably not PAC-learnable for computability reasons, not sample complexity reasons. If the hypothesis class involves solving an undecidable problem during evaluation, no learner can succeed. The halting problem is the canonical example of an undecidable problem, and its proof technique (diagonalization) recurs throughout set theory, complexity theory, and learning theory.

Core Definitions

Definition

Turing Machine

A Turing machine is a 7-tuple M=(Q,Σ,Γ,δ,q0,qaccept,qreject)M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}}) where:

  • QQ is a finite set of states
  • Σ\Sigma is a finite input alphabet (not containing the blank symbol \sqcup)
  • ΓΣ{}\Gamma \supseteq \Sigma \cup \{\sqcup\} is the tape alphabet
  • δ:Q×ΓQ×Γ×{L,R}\delta: Q \times \Gamma \to Q \times \Gamma \times \{L, R\} is the transition function
  • q0Qq_0 \in Q is the start state
  • qaccept,qrejectQq_{\text{accept}}, q_{\text{reject}} \in Q are the accepting and rejecting states

The machine reads from an infinite tape, one cell at a time, writes a symbol, moves left or right, and transitions to a new state. It halts when it enters qacceptq_{\text{accept}} or qrejectq_{\text{reject}}.

Definition

Decidable Language (Recursive Set)

A language LΣL \subseteq \Sigma^* is decidable (recursive) if and only if there exists a Turing machine MM that halts on every input and accepts exactly the strings in LL. That is, for every wΣw \in \Sigma^*, MM halts and outputs "accept" if wLw \in L or "reject" if wLw \notin L.

Definition

Semi-Decidable Language (Recursively Enumerable Set)

A language LΣL \subseteq \Sigma^* is semi-decidable (recursively enumerable, or r.e.) if and only if there exists a Turing machine MM that accepts exactly the strings in LL. If wLw \in L, then MM halts and accepts. If wLw \notin L, then MM may reject or may run forever.

Definition

Many-One Reduction

Language AA is many-one reducible to language BB, written AmBA \leq_m B, if and only if there exists a computable function f:ΣΣf: \Sigma^* \to \Sigma^* such that for all ww:

wA    f(w)Bw \in A \iff f(w) \in B

If AmBA \leq_m B and BB is decidable, then AA is decidable. Equivalently, if AA is undecidable and AmBA \leq_m B, then BB is undecidable.

The Church-Turing Thesis

The Church-Turing thesis is not a theorem. It is a claim about the physical world: every function that is "effectively computable" by any mechanical procedure is computable by a Turing machine. Evidence for this thesis includes the fact that every alternative model of computation (lambda calculus, recursive functions, register machines, cellular automata) computes exactly the same class of functions.

The thesis has practical force: if you cannot solve a problem with a Turing machine, you cannot solve it with any physical device. Quantum computers do not change this. They may be faster, but they compute the same set of functions.

Universal Turing Machine

A universal Turing machine UU is a single Turing machine that can simulate any other Turing machine given its description. On input M,w\langle M, w \rangle (an encoding of a TM MM together with an input string ww), UU produces the same output that MM would produce on ww.

Universality is what makes Turing machines a model of general-purpose computation rather than fixed-function machines. The same idea underlies every modern computer: the program is data, and a single hardware machine executes any program. Turing's 1936 construction is the conceptual ancestor of the stored-program computer.

Recursion Theorem and Self-Reference

Theorem

Kleene's Recursion Theorem

Statement

For every total computable function ff there exists an index ee such that ϕe=ϕf(e)\phi_e = \phi_{f(e)}. Equivalently: every computable transformation of program descriptions has a fixed point — a program whose source code is mapped (semantically) to itself.

Intuition

A program can have access to its own source code. Quine programs (programs that print their own source) and self-replicating viruses are concrete instances. The recursion theorem is the meta-level analog of the Y combinator from lambda calculus: both express that self-reference is available "for free" in any sufficiently powerful computational model.

Why It Matters

Many undecidability proofs use the recursion theorem as a one-line shortcut. It also formalizes why programs that introspect their own code are not strictly more powerful than ordinary programs, since self-reference can already be simulated.

The Halting Problem

Theorem

Undecidability of the Halting Problem

Statement

The halting problem HALT={M,w:M is a TM that halts on input w}\text{HALT} = \{\langle M, w \rangle : M \text{ is a TM that halts on input } w\} is undecidable. There is no Turing machine that, given an arbitrary Turing machine MM and input ww, always correctly determines whether MM halts on ww.

Intuition

If a halting decider existed, you could use it to construct a machine that does the opposite of what it is predicted to do. This self-referential contradiction is the same diagonal argument that Cantor used to prove the reals are uncountable.

Proof Sketch

Assume for contradiction that HH decides HALT: H(M,w)H(\langle M, w \rangle) accepts if MM halts on ww and rejects otherwise. Construct a new machine DD: on input M\langle M \rangle, run H(M,M)H(\langle M, \langle M \rangle \rangle). If HH accepts (meaning MM halts on its own description), then DD loops forever. If HH rejects (meaning MM does not halt on its own description), then DD halts.

Now run DD on D\langle D \rangle. If DD halts on D\langle D \rangle, then HH accepts, so DD loops. If DD does not halt on D\langle D \rangle, then HH rejects, so DD halts. Both cases yield a contradiction. Therefore HH cannot exist.

Why It Matters

The halting problem is the canonical undecidable problem. Most undecidability results in computer science are proved by reducing from HALT. It also shows that general program verification is impossible: no tool can check all programs for all properties. This is why static analysis, type checking, and formal verification must all restrict the class of programs or properties they consider.

Failure Mode

The proof requires the ability of Turing machines to simulate other Turing machines (universality). In restricted computational models (finite automata, pushdown automata), the halting problem is decidable because these models always halt or can be checked for loops in bounded time.

Decidability Hierarchy

The three-level classification:

ClassDefinitionClosure PropertiesExample
DecidableTM halts on all inputs, accepts LLUnion, intersection, complement{0n1n:n0}\{0^n 1^n : n \geq 0\}
Semi-decidable (r.e.)TM accepts LL, may loop on wLw \notin LUnion, intersectionHALT
Co-semi-decidable (co-r.e.)Complement is semi-decidableUnion, intersectionHALT\overline{\text{HALT}}

A language is decidable if and only if it is both semi-decidable and co-semi-decidable. This gives a useful proof technique: to show a language is decidable, show that both it and its complement are r.e.

Rice's Theorem

Theorem

Rice's Theorem

Statement

Let PP be any nontrivial property of r.e. languages (i.e., some TMs have the property and some do not). Then the set

{M:L(M) has property P}\{\langle M \rangle : L(M) \text{ has property } P\}

is undecidable.

Intuition

You cannot decide anything nontrivial about what a program computes by inspecting it. "Does this program ever output 1?" is undecidable. "Does this program compute a total function?" is undecidable. "Does this program compute the same function as that program?" is undecidable. The only decidable properties of programs are trivial ones (satisfied by all programs or no programs) or syntactic ones (about the code text, not its behavior).

Proof Sketch

Reduce from HALT. Given the property PP, let MPM_P be a machine whose language has property PP (exists by nontriviality). Given input M,w\langle M, w \rangle to the halting problem, construct a machine MM' that on input xx first simulates MM on ww. If MM halts, then MM' simulates MPM_P on xx. If MM does not halt, MM' loops. Then L(M)L(M') has property PP if and only if MM halts on ww. A decider for PP would therefore decide HALT, contradicting the halting theorem.

Why It Matters

Rice's theorem is a sweeping impossibility result. It explains why fully automatic program analysis is impossible in general: any question about program behavior (not syntax) that has both yes-instances and no-instances is undecidable. Practical tools (linters, type checkers, model checkers) work by restricting the class of programs or by being sound but incomplete (they may reject valid programs or accept invalid ones).

Failure Mode

Rice's theorem applies only to semantic properties (properties of the language L(M)L(M)). Syntactic properties of the machine description, such as "does the machine have exactly 7 states?" or "does the source code contain the string 'hello'?", are decidable because they do not require running the machine.

Reductions and the Undecidability Zoo

Many-one reductions establish a hierarchy of undecidable problems. If AmBA \leq_m B, then BB is at least as hard as AA. The standard technique for proving a new problem BB undecidable is:

  1. Choose a known undecidable problem AA (often HALT).
  2. Construct a computable function ff such that wA    f(w)Bw \in A \iff f(w) \in B.
  3. Conclude that BB is undecidable.
Example

Undecidability of the emptiness problem

The language ETM={M:L(M)=}E_{\text{TM}} = \{\langle M \rangle : L(M) = \emptyset\} is undecidable. This follows immediately from Rice's theorem (emptiness is a nontrivial semantic property). Alternatively, reduce from HALT directly: given M,w\langle M, w \rangle, construct MM' that ignores its input, simulates MM on ww, and accepts if MM halts. Then L(M)=L(M') = \emptyset iff MM does not halt on ww.

Connection to Learning Theory

Computability imposes hard limits on learnability. PAC-learnability does not require an algorithm that outputs a hypothesis consistent with the target: in realizable PAC, the learner must produce a hypothesis with generalization error at most ε\varepsilon with probability at least 1δ1-\delta (consistent ERM is one common sufficient strategy, but not the only one), and in agnostic PAC the data need not be realizable at all and the goal is to compete with the best hypothesis in the class. The computability obstruction is sharper: if evaluating membership in concepts from C\mathcal{C} requires solving an undecidable problem, no learner can even output a usable hypothesis from C\mathcal{C}, regardless of sample complexity.

More precisely: if the concept class consists of all r.e. sets, then even with infinite data, a learner cannot distinguish between a concept that accepts a given string after a very long computation and one that loops forever on that string. Sample complexity bounds become irrelevant when the hypothesis class itself is not computable.

Ben-David et al. 2019: Learnability Independent of ZFC

Ben-David, Hrubeš, Moran, Shpilka & Yehudayoff (Nature Machine Intelligence 2019, "Learnability can be undecidable") gave a far stronger result. They studied EMX (estimating the maximum), a learning task: given i.i.d. samples from an unknown distribution PP over a domain XX, output a finite set SXS \subseteq X that, with high probability, has P(S)maxSkP(S)ϵP(S) \geq \max_{|S'| \leq k} P(S') - \epsilon for some bound kk.

Their main result: there exist parameter choices for which the question "is EMX learnable?" is independent of ZFC — neither provable nor disprovable in standard set theory. The proof connects EMX learnability to the existence of a monotone compression scheme and reduces this to a combinatorial principle equivalent to a statement about cardinality below the continuum, which Cohen-style forcing shows to be independent of ZFC.

Why this matters: PAC learnability had previously been characterized by VC dimension (Blumer-Ehrenfeucht-Haussler-Warmuth 1989). Ben-David et al. show that for some learning models the analogue characterization cannot exist as a ZFC theorem at all: learnability transcends what classical mathematics can decide. This is the modern recursion-theoretic / set-theoretic limit on the foundations of learning theory, in the spirit of Gödel and Cohen but applied to the central object of statistical learning.

Common Confusions

Watch Out

Computable does not mean efficient

A function is computable exactly when some Turing machine computes it, with no bound on running time. A function is efficiently computable exactly when it can be computed in polynomial time. The class of decidable problems is vastly larger than P. Integer factoring is decidable (try all factors). Whether it is in P is unknown. The P vs NP question lives entirely within the decidable world. Computability theory asks what is solvable at all; complexity theory asks what is solvable quickly.

Watch Out

Semi-decidable is not the same as half-decidable

A semi-decidable language has a TM that says "yes" when the answer is yes but may never respond when the answer is no. It does not mean the TM gets the right answer half the time. Semi-decidability is about completeness (all yes-instances are found) without soundness of rejection (no-instances may never be classified).

Watch Out

The Church-Turing thesis is not a theorem

The Church-Turing thesis cannot be proved because "effectively computable" is an informal notion. It is a claim about the physical world, supported by the equivalence of every proposed model of computation. If someone built a physical device that computed a non-Turing-computable function, the thesis would be falsified. No such device is known or believed to exist.

Exercises

ExerciseCore

Problem

Show that the language A={w:w is a palindrome over {0,1}}A = \{w : w \text{ is a palindrome over } \{0,1\}\} is decidable by describing a Turing machine that decides it.

ExerciseCore

Problem

Prove that if LL is decidable, then L\overline{L} (the complement of LL) is also decidable.

ExerciseAdvanced

Problem

Prove that the language EQTM={M1,M2:L(M1)=L(M2)}\text{EQ}_{\text{TM}} = \{\langle M_1, M_2 \rangle : L(M_1) = L(M_2)\} is undecidable using Rice's theorem or by reduction from HALT.

ExerciseAdvanced

Problem

Give an example of a language that is semi-decidable but not decidable, and prove both claims.

ExerciseResearch

Problem

The concept class C\mathcal{C} over {0,1}\{0,1\}^* consists of all decidable languages. Is C\mathcal{C} PAC-learnable? Explain why computability, not sample complexity, is the bottleneck.

References

Canonical:

  • Turing, "On Computable Numbers, with an Application to the Entscheidungsproblem" (1936): the original construction
  • Sipser, Introduction to the Theory of Computation (2013), Chapters 3-5
  • Rogers, Theory of Recursive Functions and Effective Computability (1987), Chapters 1-7

Additional:

  • Hopcroft, Motwani, Ullman, Introduction to Automata Theory, Languages, and Computation (2006), Chapter 9
  • Cutland, Computability: An Introduction to Recursive Function Theory (1980), Chapters 1-4
  • Soare, Turing Computability: Theory and Applications (2016): modern reference
  • Odifreddi, Classical Recursion Theory (1989): comprehensive
  • Cooper, Computability Theory (2004): modern textbook

Classical undecidability landmarks:

  • Matiyasevich, "Enumerable sets are Diophantine" (1970): Hilbert's 10th problem
  • Novikov & Boone (1955-1958): undecidability of the word problem for groups

Connection to ML:

  • Shalev-Shwartz and Ben-David, Understanding Machine Learning (2014), Chapter 8 (computational complexity of learning)
  • Ben-David, Hrubeš, Moran, Shpilka & Yehudayoff, "Learnability can be undecidable" (Nature Machine Intelligence 2019)

Next Topics

  • P vs NP: once you know a problem is decidable, the next question is whether it is efficiently decidable
  • Kolmogorov complexity: an alternative lens on computability that measures the information content of individual strings