B-Splines

Sneiderman, Robby

Optimization Function Classes

B-Splines

A numerically stable basis for piecewise polynomials, defined by de Boor's recurrence. Local support, partition-of-unity, banded design matrices, and why every numerical spline implementation uses B-splines rather than the truncated-power basis.

AdvancedAdvancedTier 1StableSupporting~50 min

For:MLStatsResearch

Prerequisites

Linear Regression Smoothing Splines Functional Analysis Core

Prereq Map

Why This Matters

A piecewise polynomial of degree $k$ with $K$ interior knots has $K + k + 1$ degrees of freedom. The natural way to parametrize it is the truncated-power basis $\{1, x, x^2, \ldots, x^k, (x - \tau_1)_+^k, \ldots, (x - \tau_K)_+^k\}$ , where $(t)_+ = \max(t, 0)$ . This basis is mathematically clean and numerically terrible: the columns of the design matrix are highly correlated (the powers $1, x, x^2, x^3$ are all nearly collinear on any interval, and the truncated powers add little independent information), the condition number scales exponentially in $k$ , and at $k = 10$ the ordinary least squares system is unsolvable in double precision.

B-splines solve all of these problems with a different basis for the same function space. Each B-spline basis function $B_{j,k}(x)$ has support on only $k + 1$ adjacent intervals between knots. The design matrix becomes banded with bandwidth $k + 1$ . The condition number is bounded by a constant depending only on the knot ratios, not on the knot count. Every numerical spline implementation (R's splines::bs, scipy's BSpline, the COBS package, every CAD/CAM system) uses B-splines.

ESL 2nd ed. Appendix to Ch 5 (pp. 186-189) develops the B-spline construction. The canonical book-length treatment is de Boor's A Practical Guide to Splines (2001 revised edition).

Quick Version

Property	Value
Basis size at degree $k$ with $K$ interior knots	$K + k + 1$
Support of $B_{j, k}$	$[\tau_j, \tau_{j + k + 1}]$ ( $k + 1$ intervals)
Partition of unity	$\sum_j B_{j, k}(x) = 1$ for $x$ in interior of knot range
Nonnegativity	$B_{j, k}(x) \geq 0$
Continuity	$C^{k-1}$ at simple knots; $C^{k - r}$ at knots of multiplicity $r$
de Boor recurrence	$B_{j, k}(x) = \frac{x - \tau_j}{\tau_{j+k} - \tau_j} B_{j, k-1}(x) + \frac{\tau_{j+k+1} - x}{\tau_{j+k+1} - \tau_{j+1}} B_{j+1, k-1}(x)$
Design matrix bandwidth	$k + 1$
Condition number	$O(1)$ in knot count

The recurrence is the source of every numerical algorithm in the family: evaluation, differentiation, knot insertion, knot removal. It is exact in finite-precision arithmetic and free of cancellation.

Formal Setup

Definition

B-Spline of Degree 0 $B_{j, 0}$

Given an ordered knot sequence $\tau_0 < \tau_1 < \cdots < \tau_n$ , the degree-0 B-spline at index $j$ is the indicator of the interval $[\tau_j, \tau_{j+1})$ : $B_{j, 0}(x) = \mathbf{1}_{[\tau_j, \tau_{j+1})}(x).$ The set $\{B_{j, 0}\}_{j=0}^{n-1}$ partitions the knot range into disjoint indicators.

Definition

B-Spline of Degree k by de Boor Recurrence $B_{j, k}$

For $k \geq 1$ , define $B_{j, k}(x) = \frac{x - \tau_j}{\tau_{j+k} - \tau_j} B_{j, k-1}(x) + \frac{\tau_{j+k+1} - x}{\tau_{j+k+1} - \tau_{j+1}} B_{j+1, k-1}(x).$ When the knot sequence has $K$ interior knots plus $k + 1$ repeated boundary knots at each end (typical), the basis $\{B_{j, k}\}_{j=0}^{K+k}$ spans the space of piecewise polynomials of degree $k$ that are $C^{k-1}$ at every interior knot.

The recurrence is the operational definition. The closed-form B-spline involves divided differences of the truncated-power functions; the recurrence implements the divided-difference computation without catastrophic cancellation.

Properties

Proposition

Partition of Unity for B-Splines

Statement

For $x$ in the support of the B-spline basis (the interior of the knot range after the boundary repetitions), $\sum_{j} B_{j, k}(x) = 1.$ Each $B_{j, k}$ is nonnegative.

Intuition

At degree 0 the basis functions are indicator functions of disjoint intervals, so their sum equals 1 inside the knot range. The recurrence preserves the partition-of-unity property because the two convex coefficients in $B_{j, k}(x)$ add up to 1 for $x$ inside the relevant intervals (a routine check).

Partition of unity has a practical consequence: a constant function $1$ is exactly representable as $\sum_j c_j B_{j, k}$ with $c_j = 1$ . A linear function $x$ is representable as $\sum_j (\text{some affine function of } \tau_j) B_{j, k}$ because B-splines reproduce polynomials up to degree $k$ exactly when the control points are placed at the Greville abscissae $\tau^*_j = (\tau_{j+1} + \cdots + \tau_{j+k})/k$ .

Why It Matters

The partition-of-unity property is what makes B-spline fitting numerically benign. The basis functions are interpolants in disguise: the coefficient $c_j$ in $\sum c_j B_{j, k}$ is approximately equal to $f(\tau^*_j)$ for smooth $f$ , so the system matrix is close to the identity rather than to a near-singular Vandermonde matrix.

Failure Mode

Partition of unity fails outside the boundary repetition padding. If the boundary knots are not repeated to multiplicity $k + 1$ , the basis at the boundary is incomplete and $\sum_j B_{j, k}(x) < 1$ near the endpoints. The fix is the boundary-padding convention: replicate the boundary knot $k$ extra times so the basis is complete throughout the knot range.

report a correction →

Optional Proofde Boor's evaluation algorithmShow

ESL 2nd ed. pp. 186-187 and de Boor (2001) Ch IX give the algorithm. To evaluate $s(x) = \sum_j c_j B_{j, k}(x)$ at a query point $x$ in the interval $[\tau_\ell, \tau_{\ell + 1})$ :

Initialize $d_j^{[0]} = c_j$ for $j = \ell - k, \ldots, \ell$ .
For $r = 1, 2, \ldots, k$ , compute for $j = \ell - k + r, \ldots, \ell$ : $d_j^{[r]} = (1 - \alpha_j^{[r]}) \, d_{j-1}^{[r-1]} + \alpha_j^{[r]} \, d_j^{[r-1]}, \quad \alpha_j^{[r]} = \frac{x - \tau_j}{\tau_{j+k-r+1} - \tau_j}.$
Return $d_\ell^{[k]} = s(x)$ .

The algorithm is $O(k^2)$ regardless of the knot count, evaluates without explicit basis functions, and is numerically stable: every $\alpha_j^{[r]} \in [0, 1]$ in finite-precision arithmetic, so each step is a convex combination and introduces no cancellation.

The same recursion gives derivatives: $s'(x) = \sum_{j} c_j^{[1]} B_{j, k-1}(x)$ where $c_j^{[1]} = k (c_j - c_{j-1}) / (\tau_{j+k} - \tau_j)$ . Higher derivatives chain naturally.

Regression Splines

Use the B-spline basis to fit a piecewise polynomial of degree $k$ to data $(X_1, Y_1), \ldots, (X_n, Y_n)$ : $\hat{c} = \arg\min_{c \in \mathbb{R}^{K + k + 1}} \sum_{i=1}^n \left(Y_i - \sum_j c_j B_{j, k}(X_i)\right)^2.$ This is ordinary least squares with design matrix $\boldsymbol{B}$ having entries $B_{ij} = B_{j, k}(X_i)$ . The design matrix is banded (each row has at most $k + 1$ nonzero entries). Normal equations $\boldsymbol{B}^\top \boldsymbol{B} \, \hat{c} = \boldsymbol{B}^\top \boldsymbol{Y}$ have a banded coefficient matrix and solve in $O(n k^2)$ .

Knot placement. The classic recipes:

Equal spacing on quantiles of $X$ : ESL's default. Places knots so that roughly equal numbers of observations fall between adjacent knots.
Equal spacing on $X$ : simpler, fine when the input distribution is roughly uniform.
Adaptive (MARS, BARS): choose knots by forward stagewise or Bayesian search. See MARS.

The knot count controls model complexity. AIC, BIC, or cross-validation select it. A common heuristic: start with $K = 4$ to $7$ and add knots until the residual sum of squares stops improving by a meaningful margin.

Why Truncated Powers Fail

The truncated-power basis at degree 3 with knots $\tau_1 < \cdots < \tau_K$ is $\{1, x, x^2, x^3, (x - \tau_1)_+^3, \ldots, (x - \tau_K)_+^3\}.$ The four powers $1, x, x^2, x^3$ are nearly linearly dependent on any bounded interval after standardization. The Vandermonde-like structure of $\boldsymbol{B}^\top \boldsymbol{B}$ has condition number exponential in degree. At degree 5 with 10 knots, the condition number exceeds $10^{12}$ in double precision and the linear solve returns nonsense.

B-splines fix this by replacing $\{1, x, x^2, x^3\}$ (a highly correlated local-redundancy basis) with $\{B_{j, 3}\}_{j=0}^{3+K}$ (a local-support basis with constant condition number). Same function space, vastly better numerics. ESL 2nd ed. pp. 186-189 makes this explicit.

Implementation Notes

The standard implementations.

R: splines::bs(x, df = ...) constructs the B-spline design matrix with quantile knot placement. The companion splines::ns(x, df = ...) produces a natural cubic spline basis (B-splines with boundary linear-extension constraints). mgcv::s() provides smoothing splines via reduced-rank B-spline bases.
Python: scipy.interpolate.BSpline for general B-splines; patsy.dmatrix("bs(x, df=...)") mirrors R's formula syntax.

For numerically demanding work (high $k$ , many knots), scipy.interpolate implements the de Boor algorithm directly. The pybind-ed FORTRAN version in splines runs at memory-bandwidth-bound speed.

Boundary knot repetition is the most common implementation pitfall: a careful library handles it automatically, but rolling your own basis construction requires explicit padding to multiplicity $k + 1$ at each boundary. Without that the basis is rank-deficient near the endpoints.

Canonical Example

Example

Cubic regression spline on a sinusoid

Generate $n = 200$ from $X_i \sim \mathrm{Uniform}([0, 2\pi])$ , $Y_i = \sin(X_i) + 0.2 \, \varepsilon_i$ . Fit a cubic regression spline with $K$ interior knots placed at equal quantiles.

$K$	$\mathrm{df}$ ( $K + k + 1$ )	residual SD	shape
2	6	0.32	smooth; underfit at peaks
5	9	0.22	close to optimal
15	19	0.21	tracks noise; slight overfit
50	54	0.20	obvious overfit

5-fold CV picks $K = 5$ or $6$ . AIC and BIC pick similar values. The B-spline design matrix at $K = 50$ has $50 + 4 = 54$ columns; the linear solve still takes a few milliseconds because of the bandedness.

The same fit with a truncated-power basis at $K = 50$ : condition number $\approx 10^{14}$ , OLS fails, ridge with a small penalty is required to make the system invertible. The fit is visually identical to the B-spline fit (same function space), but the numerics are an order of magnitude worse.

Common Confusions

Watch Out

B-splines are a basis, not a fitting method

"Fit a B-spline" is shorthand for "fit a piecewise polynomial regression using the B-spline basis". The basis choice is orthogonal to the estimator (OLS, ridge, lasso, smoothing penalty). Smoothing splines and regression splines both use B-splines; they differ in the penalty, not the basis.

Watch Out

The Greville abscissae are not knots

The knots $\tau_j$ define the basis. The Greville abscissae $\tau^*_j = (\tau_{j+1} + \cdots + \tau_{j+k})/k$ are weighted averages of $k$ consecutive knots and live "where the basis function lives" approximately. They are useful for control-point interpolation in graphics but they are not the breakpoints of the spline.

Watch Out

Knot multiplicity controls smoothness, not interpolation accuracy

Increasing the multiplicity of a knot $\tau_j$ from $1$ to $r$ reduces smoothness at $\tau_j$ from $C^{k-1}$ to $C^{k-r}$ . The function space becomes strictly larger; the design matrix gains $r - 1$ extra columns near $\tau_j$ . Use multiplicity to introduce intentional kinks or jumps; do not use it to "improve" a smooth fit.

Exercises

ExerciseCore

Problem

Verify the de Boor recurrence for $k = 1$ : starting from $B_{j, 0}(x) = \mathbf{1}_{[\tau_j, \tau_{j+1})}(x)$ , compute $B_{j, 1}(x)$ explicitly and confirm it is the standard linear-tent basis function peaking at $\tau_{j+1}$ .

ExerciseAdvanced

Problem

Show that for a uniform knot sequence $\tau_j = j$ and degree $k$ , the B-spline $B_{0, k}(x)$ equals the $(k+1)$ -fold convolution of the indicator $\mathbf{1}_{[0, 1)}$ with itself. Hence relate B-splines to the densities of sums of independent uniform random variables.

ExerciseResearch

Problem

Construct a B-spline basis on a sphere $S^2$ with locally adaptive knot density. Discuss the obstructions: there is no canonical "knot sequence" on a manifold, no canonical boundary, and the partition-of-unity property must be enforced by hand via a finite cover and a smooth partition.

References

Canonical:

de Boor, C. (2001). A Practical Guide to Splines (revised ed.). Springer Applied Mathematical Sciences, vol. 27. The definitive reference; covers evaluation, knot insertion, B-spline arithmetic, and rigorous proofs.
Hastie, Tibshirani, Friedman. The Elements of Statistical Learning, 2nd ed. Springer (2009). Ch 5 "Basis Expansions and Regularization", §5.2 "Piecewise Polynomials and Splines" (pp. 141-148), Appendix on B-splines (pp. 186-189). The statistical-learning view.
Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall. Connects B-splines to the smoothing-spline penalty framework.

Foundational:

Schoenberg, I. J. (1946). "Contributions to the Problem of Approximation of Equidistant Data by Analytic Functions." Quarterly of Applied Mathematics 4, 45-99, 112-141. Original B-spline construction (for uniform knots).
Curry, H. B. and Schoenberg, I. J. (1966). "On Pólya Frequency Functions IV: The Fundamental Spline Functions and Their Limits." Journal d'Analyse Mathématique 17, 71-107. Non-uniform-knot extension.

Numerical analysis:

Lyche, T. and Mørken, K. (2008). Spline Methods. University of Oslo lecture notes. Modern numerical treatment with linear-algebra emphasis.
Piegl, L. and Tiller, W. (1997). The NURBS Book (2nd ed.). Springer. B-splines in the CAD/CAM setting; the standard reference for non-uniform rational B-splines and knot insertion.

Next Topics

Smoothing splines: penalized regression with B-spline basis and a roughness penalty.
Thin-plate splines: the multivariate generalization with a radial basis representation.
Generalized additive models: use B-splines as per-coordinate basis functions in a backfitting framework.
MARS: adaptive knot selection on B-spline-like bases.

Last reviewed: May 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Linear Regressionlayer 1 · tier 1
Smoothing Splineslayer 2 · tier 1
Functional Analysis Corelayer 0B · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.