Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Numerical Stability

Conditioning and Condition Number

The condition number measures how sensitive a problem is to perturbations in its input. Ill-conditioned matrices turn small errors into catastrophic ones, and understanding conditioning is essential for any computation involving linear algebra.

CoreTier 1Stable~50 min
0

Why This Matters

Every numerical computation in machine learning. solving linear systems, computing eigendecompositions, inverting matrices, running gradient descent. All of these are affected by the conditioning of the problem. A matrix can be perfectly invertible in exact arithmetic and completely useless in floating point. The condition number tells you when to trust your computation and when to panic.

When a matrix inversion produces garbage, or a linear system solver returns wildly wrong answers, or gradient descent oscillates instead of converging, the condition number is almost always the explanation. Understanding numerical stability requires understanding conditioning first.

Mental Model

Think of the condition number as an amplification factor for errors. If you perturb the input to a problem by a relative amount ϵ\epsilon, the output changes by at most κϵ\kappa \cdot \epsilon (relative). A condition number of 101010^{10} means that 16 digits of floating point precision lose 10 digits of accuracy. leaving you with 6 meaningful digits at best, and often fewer.

Core Definitions

Definition

Condition Number of a Matrix

For a nonsingular matrix ARn×nA \in \mathbb{R}^{n \times n}, the condition number with respect to the 2-norm is:

κ(A)=A2A12=σmax(A)σmin(A)\kappa(A) = \|A\|_2 \cdot \|A^{-1}\|_2 = \frac{\sigma_{\max}(A)}{\sigma_{\min}(A)}

where σmax\sigma_{\max} and σmin\sigma_{\min} are the largest and smallest singular values of AA, respectively. By convention, κ(A)1\kappa(A) \geq 1 for any nonsingular matrix, with κ(A)=1\kappa(A) = 1 only for scalar multiples of orthogonal matrices.

Definition

Ill-Conditioned Matrix

A matrix is called ill-conditioned when κ(A)\kappa(A) is large (typically κ(A)1\kappa(A) \gg 1). There is no universal threshold, but as a rule of thumb: if κ(A)10k\kappa(A) \approx 10^k, you lose roughly kk digits of accuracy when solving Ax=bAx = b in floating point. In double precision (about 16 digits), κ(A)=1016\kappa(A) = 10^{16} means you may get zero correct digits.

Definition

Well-Conditioned Matrix

A matrix is well-conditioned when κ(A)\kappa(A) is small. close to 1, or at least moderate (say, κ(A)104\kappa(A) \leq 10^4). Computations involving well-conditioned matrices are numerically reliable.

The Fundamental Perturbation Bound

Theorem

Relative Perturbation Bound for Ax = b

Statement

Let Ax=bAx = b with AA nonsingular, and let (A)(x+δx)=b+δb(A)(x + \delta x) = b + \delta b. Then:

δxxκ(A)δbb\frac{\|\delta x\|}{\|x\|} \leq \kappa(A) \cdot \frac{\|\delta b\|}{\|b\|}

More generally, if both AA and bb are perturbed:

δxx+δxκ(A)(δAA+δbb)\frac{\|\delta x\|}{\|x + \delta x\|} \leq \kappa(A) \left( \frac{\|\delta A\|}{\|A\|} + \frac{\|\delta b\|}{\|b\|} \right)

Intuition

The condition number is the worst-case amplification factor. A 1% error in bb can become a κ(A)%\kappa(A)\% error in xx. If κ(A)=108\kappa(A) = 10^8, then 8 digits of relative error in bb get amplified by 10810^8, leaving no accuracy at all. The bound is tight. there exist perturbations that achieve equality.

Proof Sketch

From Aδx=δbA \delta x = \delta b, we get δx=A1δb\delta x = A^{-1} \delta b, so δxA1δb\|\delta x\| \leq \|A^{-1}\| \cdot \|\delta b\|. Also, b=AxAx\|b\| = \|Ax\| \leq \|A\| \cdot \|x\|, so 1/xA/b1/\|x\| \leq \|A\|/\|b\|. Combining:

δxxA1δbAb=κ(A)δbb\frac{\|\delta x\|}{\|x\|} \leq \|A^{-1}\| \cdot \|\delta b\| \cdot \frac{\|A\|}{\|b\|} = \kappa(A) \cdot \frac{\|\delta b\|}{\|b\|}

Why It Matters

This bound is why the condition number is the central quantity in numerical linear algebra. It tells you the intrinsic difficulty of the problem Ax=bAx=b, independent of the algorithm used to solve it. No algorithm can do better than this bound. an ill-conditioned problem is inherently sensitive to errors, no matter how clever your solver.

Failure Mode

The bound gives the worst case. For a specific perturbation δb\delta b, the actual error amplification may be much less than κ(A)\kappa(A). The condition number is pessimistic but never optimistic. It guarantees things cannot be worse, but the typical case may be better. Also, the condition number depends on the norm used; the 2-norm condition number is most common, but κ1\kappa_1 and κ\kappa_\infty are also used.

The Singular Value Connection

The condition number κ(A)=σmax/σmin\kappa(A) = \sigma_{\max}/\sigma_{\min} has a clean geometric interpretation. The matrix AA maps the unit sphere to an ellipsoid whose semi-axes have lengths σ1σ2σn>0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n > 0.

The condition number is the aspect ratio of this ellipsoid. The ratio of the longest axis to the shortest. When κ\kappa is large, the ellipsoid is extremely elongated: AA amplifies some directions enormously while compressing others. Inverting AA then amplifies the compressed directions, blowing up any errors in those components.

For symmetric positive definite matrices, the singular values equal the eigenvalues, so:

κ(A)=λmax(A)λmin(A)\kappa(A) = \frac{\lambda_{\max}(A)}{\lambda_{\min}(A)}

Connection to Optimization

Definition

Condition Number of an Optimization Problem

For a function ff that is μ\mu-strongly convex and LL-smooth (i.e., μI2f(x)LI\mu I \preceq \nabla^2 f(x) \preceq LI everywhere), the condition number of the optimization problem is:

κ=Lμ\kappa = \frac{L}{\mu}

Gradient descent on such a function converges at rate (11/κ)t(1 - 1/\kappa)^t per iteration. Large κ\kappa means slow convergence. The function has a narrow, elongated valley that gradient descent zigzags across.

For minimizing f(x)=12xTAxbTxf(x) = \frac{1}{2} x^T A x - b^T x (the quadratic case), the optimization condition number is exactly κ(A)\kappa(A). The Hessian is AA, with L=λmax(A)L = \lambda_{\max}(A) and μ=λmin(A)\mu = \lambda_{\min}(A).

Connection to Davis-Kahan

The condition number connects to spectral perturbation theory. The Davis-Kahan theorem says that the perturbation of eigenvectors depends on the eigenvalue gap. If two eigenvalues are close, the corresponding eigenvectors are sensitive to perturbation. The condition number for the eigenvalue problem Ax=λxAx = \lambda x is related to the reciprocal of the gap between eigenvalues.

Large κ(A)\kappa(A) means σmin\sigma_{\min} is close to zero, which means AA is close to singular. The eigenvectors corresponding to small singular values are highly sensitive to perturbation.

Preconditioning

If κ(A)\kappa(A) is large, you can often improve it by preconditioning: instead of solving Ax=bAx = b, solve M1Ax=M1bM^{-1}Ax = M^{-1}b where MM is chosen so that κ(M1A)κ(A)\kappa(M^{-1}A) \ll \kappa(A).

The ideal preconditioner is M=AM = A (giving κ=1\kappa = 1), but then you need to invert AA, which is the original problem. Practical preconditioners approximate AA cheaply. diagonal, incomplete Cholesky, or multigrid methods.

Canonical Examples

Example

The Hilbert matrix: notoriously ill-conditioned

The n×nn \times n Hilbert matrix HnH_n has entries Hij=1/(i+j1)H_{ij} = 1/(i+j-1). Its condition number grows exponentially:

nnκ(Hn)\kappa(H_n)
54.8×105\approx 4.8 \times 10^5
101.6×1013\approx 1.6 \times 10^{13}
153.7×1017\approx 3.7 \times 10^{17}

For n=15n = 15, the condition number exceeds 101710^{17}. larger than 1/ϵmachine1/\epsilon_{\text{machine}} for double precision. Solving H15x=bH_{15} x = b in double precision gives nearly random answers, even though H15H_{15} is mathematically invertible.

Example

Orthogonal matrices are perfectly conditioned

If QQ is an orthogonal matrix (QTQ=IQ^T Q = I), then all singular values are 1, so κ(Q)=1\kappa(Q) = 1. Orthogonal transformations preserve distances and do not amplify errors. This is why algorithms based on QR decomposition and orthogonal transformations are numerically stable. They avoid introducing ill-conditioning.

Example

Diagonal matrices make conditioning transparent

For a diagonal matrix D=diag(d1,,dn)D = \text{diag}(d_1, \ldots, d_n) with all di>0d_i > 0:

κ(D)=maxidiminidi\kappa(D) = \frac{\max_i d_i}{\min_i d_i}

If D=diag(1,108)D = \text{diag}(1, 10^{-8}), then κ(D)=108\kappa(D) = 10^8. The second component of the solution to Dx=bDx = b has errors amplified by 10810^8 relative to the first.

Common Confusions

Watch Out

Invertible does not mean well-conditioned

A matrix can be mathematically invertible (det(A)0\det(A) \neq 0) but so ill-conditioned that numerical inversion is useless. The determinant is a terrible indicator of conditioning: a 100×100100 \times 100 identity matrix has det=1\det = 1 and κ=1\kappa = 1, but the matrix 0.1I1000.1 \cdot I_{100} has det=10100\det = 10^{-100} (tiny!) and κ=1\kappa = 1 (perfect conditioning).

Watch Out

Condition number is a property of the problem, not the algorithm

The condition number measures the intrinsic sensitivity of the mathematical problem Ax=bAx = b. It is not a property of your solver. A better algorithm can avoid adding unnecessary error (this is backward stability), but it cannot avoid the error amplification that is inherent to the problem. Gaussian elimination with partial pivoting is backward stable. It solves a nearby problem exactly. But the sensitivity to perturbations in that nearby problem is still governed by κ(A)\kappa(A).

Watch Out

Large condition number does not mean the matrix is nearly singular

Well, it does. But the converse needs care. A matrix with κ=1010\kappa = 10^{10} has σmin/σmax=1010\sigma_{\min}/\sigma_{\max} = 10^{-10}, so it is "close to singular" in a relative sense. But σmin\sigma_{\min} could still be 10510^5 if σmax=1015\sigma_{\max} = 10^{15}. The matrix is far from the zero matrix but close to the set of singular matrices relative to its own scale.

Summary

  • κ(A)=σmax/σmin\kappa(A) = \sigma_{\max}/\sigma_{\min}: ratio of largest to smallest singular value
  • The condition number bounds error amplification: a relative perturbation ϵ\epsilon in the input becomes at most κϵ\kappa \cdot \epsilon in the output
  • κ10k\kappa \approx 10^k means you lose roughly kk digits of accuracy
  • Invertibility is binary; conditioning is a spectrum
  • In optimization, κ=L/μ\kappa = L/\mu controls convergence rate of gradient descent
  • Preconditioning reduces κ\kappa to make problems tractable
  • The Hilbert matrix is the canonical example of catastrophic ill-conditioning

Exercises

ExerciseCore

Problem

Compute the condition number (2-norm) of the matrix:

A=(110106)A = \begin{pmatrix} 1 & 1 \\ 0 & 10^{-6} \end{pmatrix}

What does this tell you about solving Ax=bAx = b in floating point?

ExerciseAdvanced

Problem

Prove that for any nonsingular matrix AA:

κ(A)=maxδAδx/xδA/A\kappa(A) = \max_{\delta A} \frac{\|\delta x\| / \|x\|}{\|\delta A\| / \|A\|}

where Ax=bAx = b, (A+δA)(x+δx)=b(A + \delta A)(x + \delta x) = b (i.e., the right-hand side is fixed and the matrix is perturbed). This shows the condition number is the exact worst-case amplification.

References

Canonical:

  • Trefethen & Bau, Numerical Linear Algebra (1997), Lectures 12-15
  • Golub & Van Loan, Matrix Computations (2013), Chapter 2
  • Demmel, Applied Numerical Linear Algebra (1997), Chapters 1-2 (conditioning and perturbation theory)

Current:

  • Higham, Accuracy and Stability of Numerical Algorithms (2002), Chapters 1-2 and 7
  • Boyd & Vandenberghe, Convex Optimization (2004), Section 9.3.1 (condition number and gradient descent convergence)
  • Strang, Linear Algebra and Its Applications (2006), Section 7.3 (condition numbers and numerical stability)

Next Topics

The natural next step from conditioning:

  • Numerical linear algebra basics: algorithms (LU, QR, SVD) and their stability properties, building on the conditioning framework established here

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.