Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Algorithms Foundations

Sorting Algorithms

Comparison-based sorting lower bound, quicksort, mergesort, heapsort, and non-comparison sorts. The foundational algorithms behind efficient data processing and search.

CoreTier 2Stable~35 min
0

Why This Matters

Sorting is the most fundamental algorithmic primitive. If you can sort, you can search (binary search), find duplicates, compute order statistics, and build efficient data structures. In machine learning, sorting appears in data preprocessing, nearest neighbor search, building decision trees (sorting features by split quality), and computing rank-based metrics like AUC.

The comparison-based sorting lower bound is the cleanest example of a computational lower bound: a proof that no algorithm (no matter how clever) can do better than Ω(nlogn)\Omega(n \log n) comparisons. This style of argument, information-theoretic lower bounds, appears throughout theoretical CS and ML.

Mental Model

You have nn items in an unknown order. You can compare pairs of items ("is a<ba < b?") and rearrange them. How many comparisons are needed?

There are n!n! possible orderings. Each comparison eliminates at most half the remaining possibilities (it answers a yes/no question). So you need at least log2(n!)\log_2(n!) comparisons to identify the correct ordering.

The Lower Bound

Theorem

Comparison-Based Sorting Lower Bound

Statement

Any comparison-based sorting algorithm requires at least log2(n!)\lceil \log_2(n!) \rceil comparisons in the worst case. By Stirling's approximation:

log2(n!)=nlog2nnlog2e+O(logn)=Θ(nlogn)\log_2(n!) = n \log_2 n - n \log_2 e + O(\log n) = \Theta(n \log n)

Therefore, no comparison-based sorting algorithm can run in o(nlogn)o(n \log n) worst-case time.

Intuition

Model the algorithm as a binary decision tree. Each internal node is a comparison, each leaf is a permutation (output ordering). The tree must have at least n!n! leaves (one for each possible input ordering). A binary tree with n!n! leaves has height at least log2(n!)\log_2(n!). The height is the worst-case number of comparisons.

Proof Sketch

Any deterministic comparison-based algorithm on nn elements defines a binary decision tree: at each node, compare two elements and branch left or right. Each leaf corresponds to a specific output permutation. For the algorithm to be correct, every input permutation must reach a leaf with the correct sorted output, so there must be at least n!n! reachable leaves. A binary tree of height hh has at most 2h2^h leaves, so 2hn!2^h \geq n! and hlog2(n!)h \geq \log_2(n!).

Why It Matters

This is the cleanest example of a computational lower bound via an information-theoretic argument. The technique. counting distinguishable inputs and bounding the information gained per operation. applies far beyond sorting. It shows that quicksort, mergesort, and heapsort are all optimal (up to constant factors) among comparison-based sorts.

Failure Mode

The bound only applies to comparison-based algorithms. If you know extra structure about the input (e.g., all elements are integers in a known range), you can beat Ω(nlogn)\Omega(n \log n) using counting sort or radix sort.

Comparison-Based Sorts

Quicksort

Algorithm: Choose a pivot element. Partition the array into elements less than the pivot and elements greater than the pivot. Recurse on both halves.

Theorem

Quicksort Expected Runtime

Statement

Quicksort with uniformly random pivot selection runs in O(nlogn)O(n \log n) expected time and O(n2)O(n^2) worst-case time. The expected number of comparisons is 2nlnn+O(n)2n \ln n + O(n).

Intuition

A random pivot splits the array into two parts. On average, the split is reasonably balanced (not necessarily exactly half, but good enough). Balanced splits give O(logn)O(\log n) levels of recursion, each doing O(n)O(n) work. The worst case (O(n2)O(n^2)) occurs when every pivot is the minimum or maximum, giving nn levels of recursion. But this is exponentially unlikely with random pivots.

Proof Sketch

Let XijX_{ij} be the indicator that elements ii and jj (in sorted order) are compared. Elements ii and jj are compared if and only if one of them is chosen as a pivot before any element between them. This happens with probability 2/(ji+1)2/(j - i + 1). The expected number of comparisons is:

E[i<jXij]=i=1nj=i+1n2ji+1=2nHnO(n)=2nlnn+O(n)\mathbb{E}\left[\sum_{i < j} X_{ij}\right] = \sum_{i=1}^{n} \sum_{j=i+1}^{n} \frac{2}{j - i + 1} = 2n H_n - O(n) = 2n \ln n + O(n)

where HnH_n is the nn-th harmonic number.

Why It Matters

Quicksort is the most practically efficient comparison sort due to excellent cache performance and small constant factors. Its in-place nature (no extra memory) makes it the default in most standard library sort implementations (with modifications like introsort to avoid worst-case behavior).

Failure Mode

Deterministic quicksort with a bad pivot rule (e.g., always first element) degrades to O(n2)O(n^2) on sorted or nearly-sorted input. exactly the case that occurs most often in practice. Always use random or median-of-three pivots.

Properties: In-place (O(log n) stack space), not stable, O(nlogn)O(n \log n) expected, O(n2)O(n^2) worst case.

Mergesort

Algorithm: Split the array in half. Recursively sort each half. Merge the two sorted halves in O(n)O(n) time.

Properties: Not in-place (O(n)O(n) extra space), stable, O(nlogn)O(n \log n) guaranteed.

Why it matters: Mergesort is the canonical divide-and-conquer algorithm. Its guaranteed O(nlogn)O(n \log n) runtime (no worst case) makes it preferable when predictability matters. The stability property (equal elements maintain their relative order) is important for multi-key sorting.

Heapsort

Algorithm: Build a max-heap from the array in O(n)O(n) time. Repeatedly extract the maximum and place it at the end.

Properties: In-place, not stable, O(nlogn)O(n \log n) guaranteed.

Why it matters: Heapsort gives the best of both worlds: in-place like quicksort and guaranteed O(nlogn)O(n \log n) like mergesort. However, it has poor cache locality in practice, making it slower than quicksort despite the better worst case.

Summary of Comparison Sorts

AlgorithmBest CaseAverage CaseWorst CaseSpaceStable?
QuicksortO(nlogn)O(n \log n)O(nlogn)O(n \log n)O(n2)O(n^2)O(logn)O(\log n)No
MergesortO(nlogn)O(n \log n)O(nlogn)O(n \log n)O(nlogn)O(n \log n)O(n)O(n)Yes
HeapsortO(nlogn)O(n \log n)O(nlogn)O(n \log n)O(nlogn)O(n \log n)O(1)O(1)No

Non-Comparison Sorts

These bypass the Ω(nlogn)\Omega(n \log n) lower bound by exploiting structure in the input (integers in a known range).

Counting Sort

Input: nn integers in range [0,k)[0, k). Time: O(n+k)O(n + k). Space: O(k)O(k). Stable: Yes.

Algorithm: Count occurrences of each value, then compute prefix sums to determine positions. This is linear when k=O(n)k = O(n).

Radix Sort

Input: nn integers with dd digits in base bb. Time: O(d(n+b))O(d(n + b)). Space: O(n+b)O(n + b). Stable: Yes.

Algorithm: Sort by least significant digit first, then by next digit, etc., using a stable sort (counting sort) at each level. For nn integers in range [0,nc)[0, n^c), choose b=nb = n and get d=cd = c passes, giving O(cn)O(cn) time, linear for constant cc.

Why ML People Should Care

Data preprocessing: Sorting is the first step in many data pipelines. Efficient sorting of millions of feature values is essential for decision tree training (finding optimal splits) and quantile computation. It is also fundamental to model evaluation workflows.

Nearest neighbor search: Many approximate nearest neighbor algorithms rely on sorting distances. KD-trees partition data along sorted dimensions.

Rank-based metrics: AUC (area under ROC curve) requires sorting predictions by score. Rank correlation (Kendall's tau, Spearman's rho) is defined in terms of sorted orders.

Efficient implementations: Understanding sorting helps you write faster code. Knowing that your data is nearly sorted (use insertion sort or Timsort), that your keys are integers (use radix sort), or that you need stability (use mergesort) can make orders-of-magnitude performance differences.

Common Confusions

Watch Out

O(n log n) is a lower bound, not just an upper bound

The Ω(nlogn)\Omega(n \log n) bound is a lower bound on comparison-based sorting. Quicksort, mergesort, and heapsort all achieve O(nlogn)O(n \log n), matching the lower bound. This means they are optimal. You cannot do better with comparisons. Non-comparison sorts beat this bound by using more information than just pairwise comparisons.

Watch Out

In-place does not mean zero extra space

Quicksort is called in-place because it uses O(logn)O(\log n) stack space (for recursion), not O(n)O(n). Heapsort uses O(1)O(1) extra space. "In-place" is relative to the input size, not literally zero.

Watch Out

Stability matters more than you think

A stable sort preserves the relative order of equal elements. This matters when you sort by multiple keys: sort by secondary key first, then by primary key (using a stable sort). If the sort is not stable, the secondary ordering is destroyed.

Summary

  • Comparison-based lower bound: Ω(nlogn)\Omega(n \log n) via decision tree argument
  • Quicksort: O(nlogn)O(n \log n) expected, O(n2)O(n^2) worst, in-place, not stable
  • Mergesort: O(nlogn)O(n \log n) guaranteed, O(n)O(n) space, stable
  • Heapsort: O(nlogn)O(n \log n) guaranteed, in-place, not stable
  • Non-comparison sorts (counting, radix): O(n)O(n) when key range is manageable
  • The lower bound argument (counting leaves in a decision tree) is a fundamental technique in theoretical CS

Exercises

ExerciseCore

Problem

Using the decision tree argument, what is the minimum number of comparisons needed to sort 5 elements? Compare this to the number used by mergesort on 5 elements.

ExerciseAdvanced

Problem

Prove that quicksort's worst case is Ω(n2)\Omega(n^2) by constructing an input where deterministic quicksort (with the first element as pivot) makes Θ(n2)\Theta(n^2) comparisons.

ExerciseAdvanced

Problem

You have n=106n = 10^6 integers in the range [0,106)[0, 10^6). Which sorting algorithm should you use and what is its runtime? What if the range is [0,1018)[0, 10^{18})?

References

Canonical:

  • Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapters 7-9
  • Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching

Current:

  • Sedgewick & Wayne, Algorithms (4th ed.), Chapters 2.1-2.5

  • Hastie, Tibshirani, Friedman, The Elements of Statistical Learning (2009)

Next Topics

The natural next steps from sorting:

  • Binary search: the key application of sorted data
  • Order statistics: finding the k-th smallest element in O(n)O(n) time
  • Hash tables: when you need faster-than-sorting lookups

Last reviewed: April 2026

Builds on This