Foundations

Linear Independence

A set of vectors is linearly independent if no vector is a redundant copy of the others. The concept underwrites basis, dimension, rank, the column-space test for $Ax = b$, and every overdetermined-versus-underdetermined diagnosis in linear regression and gradient-based optimization.

CoreTier 1StableCore spine~35 min

Prerequisites

Vectors Matrices and Linear Maps

Prereq Map

Why This Matters

Half of every ML diagnostic of the form "is this system solvable / is this fit unique / are these features collinear / why is the optimiser stuck on a flat direction" reduces to a question about linear independence. The matrix has full column rank means its columns are linearly independent. The features are collinear means a column is a linear combination of the others. The Hessian is singular means its column space is missing a direction.

The concept is small but load-bearing. Skip it and the rest of linear algebra reads as notation rather than structure.

Mental Model

A set of vectors is linearly independent when none of them is a redundant copy of the others. Concretely: you cannot write any one vector as a weighted sum of the rest.

Geometrically in $\mathbb{R}^3$ :

One nonzero vector: independent (a line through the origin).
Two vectors: independent if and only if neither is a scalar multiple of the other (they span a plane, not a line).
Three vectors: independent if and only if they do not all lie in a common plane through the origin (they span all of $\mathbb{R}^3$ ).

Algebraically, the test is the same in any dimension: the only linear combination that gives the zero vector is the all-zero combination.

Core Definitions

Definition

Linear Independence $\sum_{i} c_{i} v_{i} = 0 \Rightarrow c_{i} = 0$

A finite set $\{v_1, \ldots, v_k\} \subseteq \mathbb{R}^n$ is linearly independent if and only if the only solution to $\sum_{i=1}^{k} c_i v_i = 0$ is $c_1 = c_2 = \cdots = c_k = 0$ . If any nonzero solution exists, the set is linearly dependent.

Definition

Linear Combination $\sum_{i} c_{i} v_{i}$

A linear combination of $\{v_1, \ldots, v_k\}$ with coefficients $c_1, \ldots, c_k \in \mathbb{R}$ is the vector $\sum_i c_i v_i = c_1 v_1 + c_2 v_2 + \cdots + c_k v_k$ . The all-zero combination $c_i = 0$ is the trivial combination and always yields $0$ ; the question is whether any non-trivial combination also yields $0$ .

Definition

Span $span {v_{1}, \dots, v_{k}}$

The span of $\{v_1, \ldots, v_k\}$ is the set of all linear combinations: $\operatorname{span}\{v_1, \ldots, v_k\} = \{\sum_i c_i v_i : c_i \in \mathbb{R}\}$ . Linear independence is the question of whether each $v_i$ contributes a new direction to this set or duplicates an existing one.

Equivalent Characterisations

The most useful theorem on this page is that several plausible-looking definitions of independence are all equivalent.

Theorem

Equivalent Tests for Linear Independence

Statement

The following statements about a finite set $\{v_1, \ldots, v_k\} \subseteq \mathbb{R}^n$ are equivalent:

The only solution to $\sum_i c_i v_i = 0$ is $c_i = 0$ for all $i$ .
No $v_j$ is a linear combination of the other $v_i$ ( $i \neq j$ ).
The $n \times k$ matrix $V = [v_1 \mid v_2 \mid \cdots \mid v_k]$ has $\operatorname{null}(V) = \{0\}$ .
The $n \times k$ matrix $V$ has rank $k$ .
The Gram matrix $V^\top V \in \mathbb{R}^{k \times k}$ is invertible.

Intuition

Each formulation looks at the same fact from a different angle. Statement 1 is the algebraic test. Statement 2 is the redundancy test. Statement 3 sees the vectors as columns of a matrix and asks about its null space. Statement 4 is a rank reformulation. Statement 5 is the symmetric form used in least-squares and Gram-matrix arguments.

Proof Sketch

$(1) \Leftrightarrow (2)$ . If some $v_j = \sum_{i \neq j} a_i v_i$ , then $\sum_i c_i v_i = 0$ with $c_j = -1$ and $c_i = a_i$ for $i \neq j$ is a nontrivial solution. Conversely, a nontrivial solution with some $c_j \neq 0$ lets you isolate $v_j$ as a linear combination of the rest.

$(1) \Leftrightarrow (3)$ . Writing $\sum_i c_i v_i$ as the matrix product $Vc$ with $c = (c_1, \ldots, c_k)^\top$ , statement 1 says $Vc = 0$ forces $c = 0$ , which is exactly $\operatorname{null}(V) = \{0\}$ .

$(3) \Leftrightarrow (4)$ . By the rank-nullity theorem, $\operatorname{rank}(V) + \dim(\operatorname{null}(V)) = k$ , so $\dim(\operatorname{null}(V)) = 0$ iff $\operatorname{rank}(V) = k$ .

$(4) \Leftrightarrow (5)$ . The matrix $V^\top V$ is $k \times k$ and positive semidefinite. Its rank equals $\operatorname{rank}(V)$ . So $V^\top V$ is invertible (full rank) iff $\operatorname{rank}(V) = k$ .

Why It Matters

In practice, you reach for whichever form is cheapest to check. For a small concrete example, statement 1 is fastest. For a column of a larger matrix, statement 4 (rank) is what numerical libraries report. For the normal equations $(V^\top V)\hat{\beta} = V^\top y$ of least squares, statement 5 is the existence-of-unique-solution condition.

Failure Mode

Linear independence does not require orthogonality. Orthogonal nonzero vectors are independent, but independent vectors are typically not orthogonal. Confusing the two breaks down the moment you try to construct a basis for a subspace from independent but non-orthogonal columns and forget that you need Gram-Schmidt or QR to orthogonalise before relying on $V^\top V = I$ .

report a correction →

Worked Example: Three Vectors in R^3

Example

Independent or dependent?

Let $v_1 = (1, 0, 0)$ , $v_2 = (1, 1, 0)$ , $v_3 = (1, 1, 1)$ .

Set up $c_1 v_1 + c_2 v_2 + c_3 v_3 = 0$ component by component:

$c_1 + c_2 + c_3 = 0, \quad c_2 + c_3 = 0, \quad c_3 = 0.$

Back-substitution gives $c_3 = 0$ , then $c_2 = 0$ , then $c_1 = 0$ . The only solution is the trivial one, so the three vectors are linearly independent. The matrix $V = [v_1 \mid v_2 \mid v_3]$ is upper triangular with nonzero diagonal, hence invertible, hence rank 3, consistent with statement 4 of the equivalences theorem.

Example

A dependent triple

Let $w_1 = (1, 2, 3)$ , $w_2 = (2, 4, 6)$ , $w_3 = (1, 0, 1)$ .

Notice $w_2 = 2 w_1$ . Therefore $2 w_1 - w_2 + 0 \cdot w_3 = 0$ is a nontrivial combination giving $0$ , so the set is linearly dependent. The redundancy is concentrated in the relationship between $w_1$ and $w_2$ ; $w_3$ is independent of $w_1$ separately, but independence is a property of the whole set, not of pairs.

Tightest Upper Bound on Independent Vectors

Theorem

At Most n Independent Vectors in R^n

Statement

Any set of more than $n$ vectors in $\mathbb{R}^n$ is linearly dependent. Equivalently, the maximum size of a linearly independent set in $\mathbb{R}^n$ is $n$ .

Intuition

$\mathbb{R}^n$ has dimension $n$ . A linearly independent set extends to a basis, and every basis has exactly $n$ elements. You cannot fit more than $n$ truly independent directions into an $n$ -dimensional space.

Proof Sketch

Stack any $k > n$ vectors as columns of an $n \times k$ matrix $V$ . The matrix has only $n$ rows, so its rank is at most $n < k$ . By the rank-nullity theorem, $\dim(\operatorname{null}(V)) = k - \operatorname{rank}(V) \geq k - n > 0$ , so the null space is non-trivial: there is a nonzero $c$ with $Vc = 0$ , i.e. a non-trivial linear combination of the columns giving the zero vector.

Why It Matters

This is the headline feasibility check on most ML diagnostics. With $n$ data dimensions and $k > n$ features, your design matrix has linearly dependent columns by sheer counting; ridge regression and $\ell_1$ /elastic-net regularisation exist partly to tame this regime. With $n$ parameters and $k$ training points where $k < n$ , there are infinitely many parameter vectors that fit the data exactly: the classical underdetermined regime that motivates implicit-bias analyses of deep nets.

Failure Mode

The theorem requires $\mathbb{R}^n$ as an $n$ -dimensional vector space; in infinite-dimensional spaces you can have arbitrarily large linearly independent sets (think Hermite or Fourier bases on a function space). For ML, this matters in kernel methods and RKHS-style arguments where the feature space is implicitly infinite.

report a correction →

Connection to Ax = b

Linear independence of the columns of $A$ is the uniqueness condition for $Ax = b$ . Writing $A = [a_1 \mid a_2 \mid \cdots \mid a_n]$ , a solution $x = (x_1, \ldots, x_n)^\top$ to $Ax = b$ is exactly a recipe for assembling $b$ as a linear combination of columns: $Ax = \sum_i x_i a_i = b.$

If the columns of $A$ are linearly independent, any solution is unique. Two distinct solutions $x, x'$ would give $A(x - x') = 0$ with $x - x' \neq 0$ , contradicting independence.
If the columns are linearly dependent, any solution comes with a whole affine subspace of solutions; you can add any null-space element of $A$ and still solve $Ax = b$ .

Solvability (does any solution exist?) is the column space question and is logically separate. Uniqueness is the column- independence question. Both must hold for the system to have exactly one solution.

Common Confusions

Watch Out

Independence is a property of a set, not a pair

Three vectors can be pairwise independent yet linearly dependent as a set. Example: $(1, 0)$ , $(0, 1)$ , $(1, 1)$ in $\mathbb{R}^2$ . Each pair is independent, but the third is a linear combination of the first two, so the whole set is dependent.

Watch Out

Independence does not imply orthogonality

$(1, 0)$ and $(1, 1)$ are independent but not orthogonal. To build an orthogonal (or orthonormal) basis from independent vectors you need Gram-Schmidt, QR, or a similar procedure. Independence alone does not give you the diagonalisation $V^\top V = I$ that orthonormal columns would.

Watch Out

Adding a vector to an independent set can keep it independent

A common worry: "if I keep adding vectors, eventually one will be a combination of the others." That happens once you exceed the ambient dimension $n$ , not before. Up to that point, you can add truly new directions and stay independent.

ML Implications

Feature collinearity. Linearly dependent feature columns make the normal equations $X^\top X \hat{\beta} = X^\top y$ singular; closed-form least squares fails. Ridge regularisation rescues the problem by replacing $X^\top X$ with $X^\top X + \lambda I$ (always invertible for $\lambda > 0$ ).
Overdetermined vs underdetermined fits. With $m$ samples and $n$ features, $m > n$ with full column rank gives a unique least-squares fit; $m < n$ leaves a whole subspace of perfect fits, and you need a regulariser, a prior, or implicit bias to pick one.
Hessian singularity. A loss function whose Hessian has linearly dependent columns at the optimum has a flat valley in parameter space, so the optimum is not unique up to second order. This is the geometric content of "the model is overparameterised."
Embedding rank. In representation learning, the rank of the matrix of learned embeddings is the number of linearly independent features the network has actually learned; rank collapse is a diagnosed failure mode in deep transformers.

Exercises

ExerciseCore

Problem

Are $(1, 2)$ , $(3, 6)$ , and $(0, 1)$ linearly independent in $\mathbb{R}^2$ ?

ExerciseCore

Problem

Let $A = \begin{pmatrix} 1 & 2 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{pmatrix}$ . Are the columns of $A$ linearly independent?

ExerciseAdvanced

Problem

Suppose the columns of $X \in \mathbb{R}^{n \times k}$ are linearly independent. Show that $X^\top X$ is invertible.

References

Strang, Introduction to Linear Algebra, 5e (2016), Section 3.4. Standard undergraduate treatment with the column-space picture.
Axler, Linear Algebra Done Right, 4e (2024), Chapter 2. Sections on span, linear independence, basis, and dimension developed without determinants.
Hoffman & Kunze, Linear Algebra, 2e (1971), Chapter 2. Classic abstract treatment.
Trefethen & Bau, Numerical Linear Algebra (1997), Lecture 1. Vectors, matrices, and the rank-1 sum picture, useful for connecting independence to the SVD geometry.
Boyd & Vandenberghe, Introduction to Applied Linear Algebra (2018), Chapter 5 (Linear independence). Free PDF at stanford.edu/~boyd/vmls; emphasis on numerical examples and ML-relevant applications.
Linear independence (Wikipedia): concise reference with the equivalent characterizations.

Practice Drills

The LA foundations gold set includes two questions that test this page directly:

la-drill-linear-independence-011: definition test
la-drill-Ax-b-uniqueness-014: the 0/1/infinite-solutions theorem via column independence

Last reviewed: May 1, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

Vectors, Matrices, and Linear Mapslayer 0A · tier 1

Derived topics

No published topic currently declares this as a prerequisite.

Why This Matters

Mental Model

Core Definitions

Equivalent Characterisations

Worked Example: Three Vectors in R^3

Tightest Upper Bound on Independent Vectors

Connection to Ax = b

Common Confusions

ML Implications

Exercises

References

Related Topics

Practice Drills

Required before and derived from this topic

Required prerequisites

Derived topics