Numerical Stability
Conditioning and Condition Number
The condition number measures how sensitive a problem is to perturbations in its input. Ill-conditioned matrices turn small errors into catastrophic ones, and understanding conditioning is essential for any computation involving linear algebra.
Why This Matters
Every numerical computation in machine learning. solving linear systems, computing eigendecompositions, inverting matrices, running gradient descent. All of these are affected by the conditioning of the problem. A matrix can be perfectly invertible in exact arithmetic and completely useless in floating point. The condition number tells you when to trust your computation and when to panic.
When a matrix inversion produces garbage, or a linear system solver returns wildly wrong answers, or gradient descent oscillates instead of converging, the condition number is almost always the explanation. Understanding numerical stability requires understanding conditioning first.
Mental Model
Think of the condition number as an amplification factor for errors. If you perturb the input to a problem by a relative amount , the output changes by at most (relative). A condition number of means that 16 digits of floating point precision lose 10 digits of accuracy. leaving you with 6 meaningful digits at best, and often fewer.
Core Definitions
Condition Number of a Matrix
For a nonsingular matrix , the condition number with respect to the 2-norm is:
where and are the largest and smallest singular values of , respectively. By convention, for any nonsingular matrix, with only for scalar multiples of orthogonal matrices.
Ill-Conditioned Matrix
A matrix is called ill-conditioned when is large (typically ). There is no universal threshold, but as a rule of thumb: if , you lose roughly digits of accuracy when solving in floating point. In double precision (about 16 digits), means you may get zero correct digits.
Well-Conditioned Matrix
A matrix is well-conditioned when is small. close to 1, or at least moderate (say, ). Computations involving well-conditioned matrices are numerically reliable.
The Fundamental Perturbation Bound
Relative Perturbation Bound for Ax = b
Statement
Let with nonsingular, and let . Then:
More generally, if both and are perturbed:
Intuition
The condition number is the worst-case amplification factor. A 1% error in can become a error in . If , then 8 digits of relative error in get amplified by , leaving no accuracy at all. The bound is tight. there exist perturbations that achieve equality.
Proof Sketch
From , we get , so . Also, , so . Combining:
Why It Matters
This bound is why the condition number is the central quantity in numerical linear algebra. It tells you the intrinsic difficulty of the problem , independent of the algorithm used to solve it. No algorithm can do better than this bound. an ill-conditioned problem is inherently sensitive to errors, no matter how clever your solver.
Failure Mode
The bound gives the worst case. For a specific perturbation , the actual error amplification may be much less than . The condition number is pessimistic but never optimistic. It guarantees things cannot be worse, but the typical case may be better. Also, the condition number depends on the norm used; the 2-norm condition number is most common, but and are also used.
The Singular Value Connection
The condition number has a clean geometric interpretation. The matrix maps the unit sphere to an ellipsoid whose semi-axes have lengths .
The condition number is the aspect ratio of this ellipsoid. The ratio of the longest axis to the shortest. When is large, the ellipsoid is extremely elongated: amplifies some directions enormously while compressing others. Inverting then amplifies the compressed directions, blowing up any errors in those components.
For symmetric positive definite matrices, the singular values equal the eigenvalues, so:
Connection to Optimization
Condition Number of an Optimization Problem
For a function that is -strongly convex and -smooth (i.e., everywhere), the condition number of the optimization problem is:
Gradient descent on such a function converges at rate per iteration. Large means slow convergence. The function has a narrow, elongated valley that gradient descent zigzags across.
For minimizing (the quadratic case), the optimization condition number is exactly . The Hessian is , with and .
Connection to Davis-Kahan
The condition number connects to spectral perturbation theory. The Davis-Kahan theorem says that the perturbation of eigenvectors depends on the eigenvalue gap. If two eigenvalues are close, the corresponding eigenvectors are sensitive to perturbation. The condition number for the eigenvalue problem is related to the reciprocal of the gap between eigenvalues.
Large means is close to zero, which means is close to singular. The eigenvectors corresponding to small singular values are highly sensitive to perturbation.
Preconditioning
If is large, you can often improve it by preconditioning: instead of solving , solve where is chosen so that .
The ideal preconditioner is (giving ), but then you need to invert , which is the original problem. Practical preconditioners approximate cheaply. diagonal, incomplete Cholesky, or multigrid methods.
Canonical Examples
The Hilbert matrix: notoriously ill-conditioned
The Hilbert matrix has entries . Its condition number grows exponentially:
| 5 | |
| 10 | |
| 15 |
For , the condition number exceeds . larger than for double precision. Solving in double precision gives nearly random answers, even though is mathematically invertible.
Orthogonal matrices are perfectly conditioned
If is an orthogonal matrix (), then all singular values are 1, so . Orthogonal transformations preserve distances and do not amplify errors. This is why algorithms based on QR decomposition and orthogonal transformations are numerically stable. They avoid introducing ill-conditioning.
Diagonal matrices make conditioning transparent
For a diagonal matrix with all :
If , then . The second component of the solution to has errors amplified by relative to the first.
Common Confusions
Invertible does not mean well-conditioned
A matrix can be mathematically invertible () but so ill-conditioned that numerical inversion is useless. The determinant is a terrible indicator of conditioning: a identity matrix has and , but the matrix has (tiny!) and (perfect conditioning).
Condition number is a property of the problem, not the algorithm
The condition number measures the intrinsic sensitivity of the mathematical problem . It is not a property of your solver. A better algorithm can avoid adding unnecessary error (this is backward stability), but it cannot avoid the error amplification that is inherent to the problem. Gaussian elimination with partial pivoting is backward stable. It solves a nearby problem exactly. But the sensitivity to perturbations in that nearby problem is still governed by .
Large condition number does not mean the matrix is nearly singular
Well, it does. But the converse needs care. A matrix with has , so it is "close to singular" in a relative sense. But could still be if . The matrix is far from the zero matrix but close to the set of singular matrices relative to its own scale.
Summary
- : ratio of largest to smallest singular value
- The condition number bounds error amplification: a relative perturbation in the input becomes at most in the output
- means you lose roughly digits of accuracy
- Invertibility is binary; conditioning is a spectrum
- In optimization, controls convergence rate of gradient descent
- Preconditioning reduces to make problems tractable
- The Hilbert matrix is the canonical example of catastrophic ill-conditioning
Exercises
Problem
Compute the condition number (2-norm) of the matrix:
What does this tell you about solving in floating point?
Problem
Prove that for any nonsingular matrix :
where , (i.e., the right-hand side is fixed and the matrix is perturbed). This shows the condition number is the exact worst-case amplification.
References
Canonical:
- Trefethen & Bau, Numerical Linear Algebra (1997), Lectures 12-15
- Golub & Van Loan, Matrix Computations (2013), Chapter 2
- Demmel, Applied Numerical Linear Algebra (1997), Chapters 1-2 (conditioning and perturbation theory)
Current:
- Higham, Accuracy and Stability of Numerical Algorithms (2002), Chapters 1-2 and 7
- Boyd & Vandenberghe, Convex Optimization (2004), Section 9.3.1 (condition number and gradient descent convergence)
- Strang, Linear Algebra and Its Applications (2006), Section 7.3 (condition numbers and numerical stability)
Next Topics
The natural next step from conditioning:
- Numerical linear algebra basics: algorithms (LU, QR, SVD) and their stability properties, building on the conditioning framework established here
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Eigenvalues and EigenvectorsLayer 0A
- Matrix Operations and PropertiesLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A