Skip to main content

Foundations

Linear Independence

A set of vectors is linearly independent if no vector is a redundant copy of the others. The concept underwrites basis, dimension, rank, the column-space test for $Ax = b$, and every overdetermined-versus-underdetermined diagnosis in linear regression and gradient-based optimization.

CoreTier 1StableCore spine~35 min

Why This Matters

Half of every ML diagnostic of the form "is this system solvable / is this fit unique / are these features collinear / why is the optimiser stuck on a flat direction" reduces to a question about linear independence. The matrix has full column rank means its columns are linearly independent. The features are collinear means a column is a linear combination of the others. The Hessian is singular means its column space is missing a direction.

The concept is small but load-bearing. Skip it and the rest of linear algebra reads as notation rather than structure.

Mental Model

A set of vectors is linearly independent when none of them is a redundant copy of the others. Concretely: you cannot write any one vector as a weighted sum of the rest.

Geometrically in R3\mathbb{R}^3:

  • One nonzero vector: independent (a line through the origin).
  • Two vectors: independent if and only if neither is a scalar multiple of the other (they span a plane, not a line).
  • Three vectors: independent if and only if they do not all lie in a common plane through the origin (they span all of R3\mathbb{R}^3).

Algebraically, the test is the same in any dimension: the only linear combination that gives the zero vector is the all-zero combination.

Core Definitions

Definition

Linear Independence

A finite set {v1,,vk}Rn\{v_1, \ldots, v_k\} \subseteq \mathbb{R}^n is linearly independent if and only if the only solution to i=1kcivi=0\sum_{i=1}^{k} c_i v_i = 0 is c1=c2==ck=0c_1 = c_2 = \cdots = c_k = 0. If any nonzero solution exists, the set is linearly dependent.

Definition

Linear Combination

A linear combination of {v1,,vk}\{v_1, \ldots, v_k\} with coefficients c1,,ckRc_1, \ldots, c_k \in \mathbb{R} is the vector icivi=c1v1+c2v2++ckvk\sum_i c_i v_i = c_1 v_1 + c_2 v_2 + \cdots + c_k v_k. The all-zero combination ci=0c_i = 0 is the trivial combination and always yields 00; the question is whether any non-trivial combination also yields 00.

Definition

Span

The span of {v1,,vk}\{v_1, \ldots, v_k\} is the set of all linear combinations: span{v1,,vk}={icivi:ciR}\operatorname{span}\{v_1, \ldots, v_k\} = \{\sum_i c_i v_i : c_i \in \mathbb{R}\}. Linear independence is the question of whether each viv_i contributes a new direction to this set or duplicates an existing one.

Equivalent Characterisations

The most useful theorem on this page is that several plausible-looking definitions of independence are all equivalent.

Theorem

Equivalent Tests for Linear Independence

Statement

The following statements about a finite set {v1,,vk}Rn\{v_1, \ldots, v_k\} \subseteq \mathbb{R}^n are equivalent:

  1. The only solution to icivi=0\sum_i c_i v_i = 0 is ci=0c_i = 0 for all ii.
  2. No vjv_j is a linear combination of the other viv_i (iji \neq j).
  3. The n×kn \times k matrix V=[v1v2vk]V = [v_1 \mid v_2 \mid \cdots \mid v_k] has null(V)={0}\operatorname{null}(V) = \{0\}.
  4. The n×kn \times k matrix VV has rank kk.
  5. The Gram matrix VVRk×kV^\top V \in \mathbb{R}^{k \times k} is invertible.

Intuition

Each formulation looks at the same fact from a different angle. Statement 1 is the algebraic test. Statement 2 is the redundancy test. Statement 3 sees the vectors as columns of a matrix and asks about its null space. Statement 4 is a rank reformulation. Statement 5 is the symmetric form used in least-squares and Gram-matrix arguments.

Proof Sketch

(1)(2)(1) \Leftrightarrow (2). If some vj=ijaiviv_j = \sum_{i \neq j} a_i v_i, then icivi=0\sum_i c_i v_i = 0 with cj=1c_j = -1 and ci=aic_i = a_i for iji \neq j is a nontrivial solution. Conversely, a nontrivial solution with some cj0c_j \neq 0 lets you isolate vjv_j as a linear combination of the rest.

(1)(3)(1) \Leftrightarrow (3). Writing icivi\sum_i c_i v_i as the matrix product VcVc with c=(c1,,ck)c = (c_1, \ldots, c_k)^\top, statement 1 says Vc=0Vc = 0 forces c=0c = 0, which is exactly null(V)={0}\operatorname{null}(V) = \{0\}.

(3)(4)(3) \Leftrightarrow (4). By the rank-nullity theorem, rank(V)+dim(null(V))=k\operatorname{rank}(V) + \dim(\operatorname{null}(V)) = k, so dim(null(V))=0\dim(\operatorname{null}(V)) = 0 iff rank(V)=k\operatorname{rank}(V) = k.

(4)(5)(4) \Leftrightarrow (5). The matrix VVV^\top V is k×kk \times k and positive semidefinite. Its rank equals rank(V)\operatorname{rank}(V). So VVV^\top V is invertible (full rank) iff rank(V)=k\operatorname{rank}(V) = k.

Why It Matters

In practice, you reach for whichever form is cheapest to check. For a small concrete example, statement 1 is fastest. For a column of a larger matrix, statement 4 (rank) is what numerical libraries report. For the normal equations (VV)β^=Vy(V^\top V)\hat{\beta} = V^\top y of least squares, statement 5 is the existence-of-unique-solution condition.

Failure Mode

Linear independence does not require orthogonality. Orthogonal nonzero vectors are independent, but independent vectors are typically not orthogonal. Confusing the two breaks down the moment you try to construct a basis for a subspace from independent but non-orthogonal columns and forget that you need Gram-Schmidt or QR to orthogonalise before relying on VV=IV^\top V = I.

Worked Example: Three Vectors in R^3

Example

Independent or dependent?

Let v1=(1,0,0)v_1 = (1, 0, 0), v2=(1,1,0)v_2 = (1, 1, 0), v3=(1,1,1)v_3 = (1, 1, 1).

Set up c1v1+c2v2+c3v3=0c_1 v_1 + c_2 v_2 + c_3 v_3 = 0 component by component:

c1+c2+c3=0,c2+c3=0,c3=0.c_1 + c_2 + c_3 = 0, \quad c_2 + c_3 = 0, \quad c_3 = 0.

Back-substitution gives c3=0c_3 = 0, then c2=0c_2 = 0, then c1=0c_1 = 0. The only solution is the trivial one, so the three vectors are linearly independent. The matrix V=[v1v2v3]V = [v_1 \mid v_2 \mid v_3] is upper triangular with nonzero diagonal, hence invertible, hence rank 3, consistent with statement 4 of the equivalences theorem.

Example

A dependent triple

Let w1=(1,2,3)w_1 = (1, 2, 3), w2=(2,4,6)w_2 = (2, 4, 6), w3=(1,0,1)w_3 = (1, 0, 1).

Notice w2=2w1w_2 = 2 w_1. Therefore 2w1w2+0w3=02 w_1 - w_2 + 0 \cdot w_3 = 0 is a nontrivial combination giving 00, so the set is linearly dependent. The redundancy is concentrated in the relationship between w1w_1 and w2w_2; w3w_3 is independent of w1w_1 separately, but independence is a property of the whole set, not of pairs.

Tightest Upper Bound on Independent Vectors

Theorem

At Most n Independent Vectors in R^n

Statement

Any set of more than nn vectors in Rn\mathbb{R}^n is linearly dependent. Equivalently, the maximum size of a linearly independent set in Rn\mathbb{R}^n is nn.

Intuition

Rn\mathbb{R}^n has dimension nn. A linearly independent set extends to a basis, and every basis has exactly nn elements. You cannot fit more than nn truly independent directions into an nn-dimensional space.

Proof Sketch

Stack any k>nk > n vectors as columns of an n×kn \times k matrix VV. The matrix has only nn rows, so its rank is at most n<kn < k. By the rank-nullity theorem, dim(null(V))=krank(V)kn>0\dim(\operatorname{null}(V)) = k - \operatorname{rank}(V) \geq k - n > 0, so the null space is non-trivial: there is a nonzero cc with Vc=0Vc = 0, i.e. a non-trivial linear combination of the columns giving the zero vector.

Why It Matters

This is the headline feasibility check on most ML diagnostics. With nn data dimensions and k>nk > n features, your design matrix has linearly dependent columns by sheer counting; ridge regression and 1\ell_1/elastic-net regularisation exist partly to tame this regime. With nn parameters and kk training points where k<nk < n, there are infinitely many parameter vectors that fit the data exactly: the classical underdetermined regime that motivates implicit-bias analyses of deep nets.

Failure Mode

The theorem requires Rn\mathbb{R}^n as an nn-dimensional vector space; in infinite-dimensional spaces you can have arbitrarily large linearly independent sets (think Hermite or Fourier bases on a function space). For ML, this matters in kernel methods and RKHS-style arguments where the feature space is implicitly infinite.

Connection to Ax = b

Linear independence of the columns of AA is the uniqueness condition for Ax=bAx = b. Writing A=[a1a2an]A = [a_1 \mid a_2 \mid \cdots \mid a_n], a solution x=(x1,,xn)x = (x_1, \ldots, x_n)^\top to Ax=bAx = b is exactly a recipe for assembling bb as a linear combination of columns: Ax=ixiai=b.Ax = \sum_i x_i a_i = b.

  • If the columns of AA are linearly independent, any solution is unique. Two distinct solutions x,xx, x' would give A(xx)=0A(x - x') = 0 with xx0x - x' \neq 0, contradicting independence.
  • If the columns are linearly dependent, any solution comes with a whole affine subspace of solutions; you can add any null-space element of AA and still solve Ax=bAx = b.

Solvability (does any solution exist?) is the column space question and is logically separate. Uniqueness is the column- independence question. Both must hold for the system to have exactly one solution.

Common Confusions

Watch Out

Independence is a property of a set, not a pair

Three vectors can be pairwise independent yet linearly dependent as a set. Example: (1,0)(1, 0), (0,1)(0, 1), (1,1)(1, 1) in R2\mathbb{R}^2. Each pair is independent, but the third is a linear combination of the first two, so the whole set is dependent.

Watch Out

Independence does not imply orthogonality

(1,0)(1, 0) and (1,1)(1, 1) are independent but not orthogonal. To build an orthogonal (or orthonormal) basis from independent vectors you need Gram-Schmidt, QR, or a similar procedure. Independence alone does not give you the diagonalisation VV=IV^\top V = I that orthonormal columns would.

Watch Out

Adding a vector to an independent set can keep it independent

A common worry: "if I keep adding vectors, eventually one will be a combination of the others." That happens once you exceed the ambient dimension nn, not before. Up to that point, you can add truly new directions and stay independent.

ML Implications

  • Feature collinearity. Linearly dependent feature columns make the normal equations XXβ^=XyX^\top X \hat{\beta} = X^\top y singular; closed-form least squares fails. Ridge regularisation rescues the problem by replacing XXX^\top X with XX+λIX^\top X + \lambda I (always invertible for λ>0\lambda > 0).
  • Overdetermined vs underdetermined fits. With mm samples and nn features, m>nm > n with full column rank gives a unique least-squares fit; m<nm < n leaves a whole subspace of perfect fits, and you need a regulariser, a prior, or implicit bias to pick one.
  • Hessian singularity. A loss function whose Hessian has linearly dependent columns at the optimum has a flat valley in parameter space, so the optimum is not unique up to second order. This is the geometric content of "the model is overparameterised."
  • Embedding rank. In representation learning, the rank of the matrix of learned embeddings is the number of linearly independent features the network has actually learned; rank collapse is a diagnosed failure mode in deep transformers.

Exercises

ExerciseCore

Problem

Are (1,2)(1, 2), (3,6)(3, 6), and (0,1)(0, 1) linearly independent in R2\mathbb{R}^2?

ExerciseCore

Problem

Let A=(121011001)A = \begin{pmatrix} 1 & 2 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{pmatrix}. Are the columns of AA linearly independent?

ExerciseAdvanced

Problem

Suppose the columns of XRn×kX \in \mathbb{R}^{n \times k} are linearly independent. Show that XXX^\top X is invertible.

References

  • Strang, Introduction to Linear Algebra, 5e (2016), Section 3.4. Standard undergraduate treatment with the column-space picture.
  • Axler, Linear Algebra Done Right, 4e (2024), Chapter 2. Sections on span, linear independence, basis, and dimension developed without determinants.
  • Hoffman & Kunze, Linear Algebra, 2e (1971), Chapter 2. Classic abstract treatment.
  • Trefethen & Bau, Numerical Linear Algebra (1997), Lecture 1. Vectors, matrices, and the rank-1 sum picture, useful for connecting independence to the SVD geometry.
  • Boyd & Vandenberghe, Introduction to Applied Linear Algebra (2018), Chapter 5 (Linear independence). Free PDF at stanford.edu/~boyd/vmls; emphasis on numerical examples and ML-relevant applications.
  • Linear independence (Wikipedia): concise reference with the equivalent characterizations.

Related Topics

Practice Drills

The LA foundations gold set includes two questions that test this page directly:

  • la-drill-linear-independence-011: definition test
  • la-drill-Ax-b-uniqueness-014: the 0/1/infinite-solutions theorem via column independence

Last reviewed: May 1, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

1

Derived topics

0

No published topic currently declares this as a prerequisite.