Algorithms Foundations
Sorting Algorithms
Comparison-based sorting lower bound, quicksort, mergesort, heapsort, and non-comparison sorts. The foundational algorithms behind efficient data processing and search.
Why This Matters
Sorting is the most fundamental algorithmic primitive. If you can sort, you can search (binary search), find duplicates, compute order statistics, and build efficient data structures. In machine learning, sorting appears in data preprocessing, nearest neighbor search, building decision trees (sorting features by split quality), and computing rank-based metrics like AUC.
The comparison-based sorting lower bound is the cleanest example of a computational lower bound: a proof that no algorithm (no matter how clever) can do better than comparisons. This style of argument, information-theoretic lower bounds, appears throughout theoretical CS and ML.
Mental Model
You have items in an unknown order. You can compare pairs of items ("is ?") and rearrange them. How many comparisons are needed?
There are possible orderings. Each comparison eliminates at most half the remaining possibilities (it answers a yes/no question). So you need at least comparisons to identify the correct ordering.
The Lower Bound
Comparison-Based Sorting Lower Bound
Statement
Any comparison-based sorting algorithm requires at least comparisons in the worst case. By Stirling's approximation:
Therefore, no comparison-based sorting algorithm can run in worst-case time.
Intuition
Model the algorithm as a binary decision tree. Each internal node is a comparison, each leaf is a permutation (output ordering). The tree must have at least leaves (one for each possible input ordering). A binary tree with leaves has height at least . The height is the worst-case number of comparisons.
Proof Sketch
Any deterministic comparison-based algorithm on elements defines a binary decision tree: at each node, compare two elements and branch left or right. Each leaf corresponds to a specific output permutation. For the algorithm to be correct, every input permutation must reach a leaf with the correct sorted output, so there must be at least reachable leaves. A binary tree of height has at most leaves, so and .
Why It Matters
This is the cleanest example of a computational lower bound via an information-theoretic argument. The technique. counting distinguishable inputs and bounding the information gained per operation. applies far beyond sorting. It shows that quicksort, mergesort, and heapsort are all optimal (up to constant factors) among comparison-based sorts.
Failure Mode
The bound only applies to comparison-based algorithms. If you know extra structure about the input (e.g., all elements are integers in a known range), you can beat using counting sort or radix sort.
Comparison-Based Sorts
Quicksort
Algorithm: Choose a pivot element. Partition the array into elements less than the pivot and elements greater than the pivot. Recurse on both halves.
Quicksort Expected Runtime
Statement
Quicksort with uniformly random pivot selection runs in expected time and worst-case time. The expected number of comparisons is .
Intuition
A random pivot splits the array into two parts. On average, the split is reasonably balanced (not necessarily exactly half, but good enough). Balanced splits give levels of recursion, each doing work. The worst case () occurs when every pivot is the minimum or maximum, giving levels of recursion. But this is exponentially unlikely with random pivots.
Proof Sketch
Let be the indicator that elements and (in sorted order) are compared. Elements and are compared if and only if one of them is chosen as a pivot before any element between them. This happens with probability . The expected number of comparisons is:
where is the -th harmonic number.
Why It Matters
Quicksort is the most practically efficient comparison sort due to excellent cache performance and small constant factors. Its in-place nature (no extra memory) makes it the default in most standard library sort implementations (with modifications like introsort to avoid worst-case behavior).
Failure Mode
Deterministic quicksort with a bad pivot rule (e.g., always first element) degrades to on sorted or nearly-sorted input. exactly the case that occurs most often in practice. Always use random or median-of-three pivots.
Properties: In-place (O(log n) stack space), not stable, expected, worst case.
Mergesort
Algorithm: Split the array in half. Recursively sort each half. Merge the two sorted halves in time.
Properties: Not in-place ( extra space), stable, guaranteed.
Why it matters: Mergesort is the canonical divide-and-conquer algorithm. Its guaranteed runtime (no worst case) makes it preferable when predictability matters. The stability property (equal elements maintain their relative order) is important for multi-key sorting.
Heapsort
Algorithm: Build a max-heap from the array in time. Repeatedly extract the maximum and place it at the end.
Properties: In-place, not stable, guaranteed.
Why it matters: Heapsort gives the best of both worlds: in-place like quicksort and guaranteed like mergesort. However, it has poor cache locality in practice, making it slower than quicksort despite the better worst case.
Summary of Comparison Sorts
| Algorithm | Best Case | Average Case | Worst Case | Space | Stable? |
|---|---|---|---|---|---|
| Quicksort | No | ||||
| Mergesort | Yes | ||||
| Heapsort | No |
Non-Comparison Sorts
These bypass the lower bound by exploiting structure in the input (integers in a known range).
Counting Sort
Input: integers in range . Time: . Space: . Stable: Yes.
Algorithm: Count occurrences of each value, then compute prefix sums to determine positions. This is linear when .
Radix Sort
Input: integers with digits in base . Time: . Space: . Stable: Yes.
Algorithm: Sort by least significant digit first, then by next digit, etc., using a stable sort (counting sort) at each level. For integers in range , choose and get passes, giving time, linear for constant .
Why ML People Should Care
Data preprocessing: Sorting is the first step in many data pipelines. Efficient sorting of millions of feature values is essential for decision tree training (finding optimal splits) and quantile computation. It is also fundamental to model evaluation workflows.
Nearest neighbor search: Many approximate nearest neighbor algorithms rely on sorting distances. KD-trees partition data along sorted dimensions.
Rank-based metrics: AUC (area under ROC curve) requires sorting predictions by score. Rank correlation (Kendall's tau, Spearman's rho) is defined in terms of sorted orders.
Efficient implementations: Understanding sorting helps you write faster code. Knowing that your data is nearly sorted (use insertion sort or Timsort), that your keys are integers (use radix sort), or that you need stability (use mergesort) can make orders-of-magnitude performance differences.
Common Confusions
O(n log n) is a lower bound, not just an upper bound
The bound is a lower bound on comparison-based sorting. Quicksort, mergesort, and heapsort all achieve , matching the lower bound. This means they are optimal. You cannot do better with comparisons. Non-comparison sorts beat this bound by using more information than just pairwise comparisons.
In-place does not mean zero extra space
Quicksort is called in-place because it uses stack space (for recursion), not . Heapsort uses extra space. "In-place" is relative to the input size, not literally zero.
Stability matters more than you think
A stable sort preserves the relative order of equal elements. This matters when you sort by multiple keys: sort by secondary key first, then by primary key (using a stable sort). If the sort is not stable, the secondary ordering is destroyed.
Summary
- Comparison-based lower bound: via decision tree argument
- Quicksort: expected, worst, in-place, not stable
- Mergesort: guaranteed, space, stable
- Heapsort: guaranteed, in-place, not stable
- Non-comparison sorts (counting, radix): when key range is manageable
- The lower bound argument (counting leaves in a decision tree) is a fundamental technique in theoretical CS
Exercises
Problem
Using the decision tree argument, what is the minimum number of comparisons needed to sort 5 elements? Compare this to the number used by mergesort on 5 elements.
Problem
Prove that quicksort's worst case is by constructing an input where deterministic quicksort (with the first element as pivot) makes comparisons.
Problem
You have integers in the range . Which sorting algorithm should you use and what is its runtime? What if the range is ?
References
Canonical:
- Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapters 7-9
- Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching
Current:
-
Sedgewick & Wayne, Algorithms (4th ed.), Chapters 2.1-2.5
-
Hastie, Tibshirani, Friedman, The Elements of Statistical Learning (2009)
Next Topics
The natural next steps from sorting:
- Binary search: the key application of sorted data
- Order statistics: finding the k-th smallest element in time
- Hash tables: when you need faster-than-sorting lookups
Last reviewed: April 2026
Builds on This
- P vs NPLayer 0A