Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Inner Product Spaces and Orthogonality

Inner product axioms, Cauchy-Schwarz inequality, orthogonality, Gram-Schmidt, projections, and the bridge to Hilbert spaces.

CoreTier 1Stable~40 min

Why This Matters

Inner products define geometry in vector spaces: lengths, angles, and orthogonality. Kernel methods replace the standard dot product with arbitrary inner products to measure similarity in high-dimensional feature spaces. Projections onto subspaces underlie least squares linear regression, PCA, and conditional expectation.

Core Definitions

Definition

Inner Product

An inner product on a real vector space VV is a function ,:V×VR\langle \cdot, \cdot \rangle: V \times V \to \mathbb{R} satisfying:

  1. Symmetry: u,v=v,u\langle u, v \rangle = \langle v, u \rangle
  2. Linearity in first argument: αu+βv,w=αu,w+βv,w\langle \alpha u + \beta v, w \rangle = \alpha \langle u, w \rangle + \beta \langle v, w \rangle
  3. Positive definiteness: v,v0\langle v, v \rangle \geq 0 with equality iff v=0v = 0

The induced norm is v=v,v\|v\| = \sqrt{\langle v, v \rangle}. The standard inner product on Rn\mathbb{R}^n is x,y=xTy\langle x, y \rangle = x^T y.

Definition

Orthogonality

Two vectors u,vu, v are orthogonal if u,v=0\langle u, v \rangle = 0, written uvu \perp v. A set {e1,,ek}\{e_1, \ldots, e_k\} is orthonormal if ei,ej=δij\langle e_i, e_j \rangle = \delta_{ij} (Kronecker delta). The orthogonal complement of a subspace WW is W={vV:v,w=0 for all wW}W^\perp = \{v \in V : \langle v, w \rangle = 0 \text{ for all } w \in W\}.

Definition

Orthogonal Projection

The orthogonal projection of vv onto a subspace WW is the unique v^W\hat{v} \in W minimizing vw\|v - w\| over wWw \in W. If {e1,,ek}\{e_1, \ldots, e_k\} is an orthonormal basis for WW:

projW(v)=i=1kv,eiei\text{proj}_W(v) = \sum_{i=1}^{k} \langle v, e_i \rangle \, e_i

The residual vprojW(v)v - \text{proj}_W(v) lies in WW^\perp.

Definition

Hilbert Space (Preview)

A Hilbert space is a complete inner product space: every Cauchy sequence converges. Finite-dimensional inner product spaces are automatically Hilbert spaces. Infinite-dimensional examples include L2L^2 (the space of square-integrable functions) and reproducing kernel Hilbert spaces (RKHS) used in kernel methods.

Main Theorems

Theorem

Cauchy-Schwarz Inequality

Statement

For all u,vVu, v \in V:

u,vuv|\langle u, v \rangle| \leq \|u\| \, \|v\|

Equality holds if and only if uu and vv are linearly dependent.

Intuition

The cosine of the angle between two vectors has magnitude at most 1. In any inner product space, we can define cosθ=u,v/(uv)\cos \theta = \langle u, v \rangle / (\|u\| \|v\|), and Cauchy-Schwarz says this is well-defined.

Proof Sketch

For v0v \neq 0, define w=uu,vv,vvw = u - \frac{\langle u, v \rangle}{\langle v, v \rangle} v. Then w,v=0\langle w, v \rangle = 0 and 0w2=u2u,v2v20 \leq \|w\|^2 = \|u\|^2 - \frac{\langle u, v \rangle^2}{\|v\|^2}. Rearranging gives u,v2u2v2\langle u, v \rangle^2 \leq \|u\|^2 \|v\|^2.

Why It Matters

Cauchy-Schwarz is the single most used inequality in analysis. It proves the triangle inequality for the induced norm. It bounds inner products (and hence correlations, cosine similarities, kernel evaluations) in terms of norms.

Failure Mode

The inequality is tight only when vectors are linearly dependent. For nearly orthogonal vectors, the bound is very loose. In high dimensions, random vectors tend to be nearly orthogonal, so the bound often has significant slack.

Gram-Schmidt Orthogonalization

Given linearly independent vectors {v1,,vk}\{v_1, \ldots, v_k\}, the Gram-Schmidt process produces an orthonormal set {e1,,ek}\{e_1, \ldots, e_k\} spanning the same subspace:

uj=vji=1j1vj,eiei,ej=ujuju_j = v_j - \sum_{i=1}^{j-1} \langle v_j, e_i \rangle \, e_i, \qquad e_j = \frac{u_j}{\|u_j\|}

This is the constructive proof that every finite-dimensional inner product space has an orthonormal basis. In matrix form, Gram-Schmidt produces the QR decomposition A=QRA = QR, which is central to numerical matrix operations.

Common Confusions

Watch Out

Orthogonality depends on the inner product

Vectors that are orthogonal under the standard dot product may not be orthogonal under a different inner product. In ML, when using a Mahalanobis distance x,yM=xTMy\langle x, y \rangle_M = x^T M y, the notion of orthogonality changes with MM.

Watch Out

Projection minimizes distance, not angle

The orthogonal projection projW(v)\text{proj}_W(v) is the closest point in WW to vv in the norm sense. It does not necessarily minimize the angle between vv and elements of WW.

Canonical Examples

Example

Least squares as projection

Given Ax=bAx = b with no exact solution, the least squares solution minimizes Axb2\|Ax - b\|^2. This is equivalent to projecting bb onto the column space of AA. The projection satisfies the normal equations ATAx=ATbA^T A x = A^T b, which express the condition that the residual bAxb - Ax is orthogonal to the column space.

Exercises

ExerciseCore

Problem

Let u=(1,2,3)Tu = (1, 2, 3)^T and v=(1,1,0)Tv = (1, -1, 0)^T in R3\mathbb{R}^3 with the standard inner product. Compute projv(u)\text{proj}_v(u) and verify that uprojv(u)u - \text{proj}_v(u) is orthogonal to vv.

ExerciseAdvanced

Problem

Prove that in any inner product space, the induced norm satisfies the parallelogram law: u+v2+uv2=2u2+2v2\|u + v\|^2 + \|u - v\|^2 = 2\|u\|^2 + 2\|v\|^2. Show conversely that any norm satisfying the parallelogram law comes from an inner product.

References

Canonical:

  • Axler, Linear Algebra Done Right (2024), Chapters 6-7
  • Halmos, Finite-Dimensional Vector Spaces (1958), Chapters 1-2
  • Strang, Linear Algebra and Its Applications (2006), Section 4.4 (orthogonality and projections)

For ML context:

  • Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Chapter 3
  • Horn & Johnson, Matrix Analysis (2013), Chapter 5 (norms and inner products)
  • Trefethen & Bau, Numerical Linear Algebra (1997), Lectures 7-8 (Gram-Schmidt and QR decomposition)

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics