Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Inverse and Implicit Function Theorem

The inverse function theorem guarantees local invertibility when the Jacobian is nonsingular. The implicit function theorem guarantees that constraint surfaces are locally graphs. Both are essential for constrained optimization and implicit layers.

CoreTier 2Stable~40 min

Prerequisites

0

Why This Matters

The inverse function theorem tells you when a nonlinear function can be "undone" locally. The implicit function theorem tells you when a system of equations F(x,y)=0F(x, y) = 0 defines yy as a function of xx, even when you cannot solve for yy explicitly. These theorems appear throughout ML: Lagrange multipliers for constrained optimization, implicit differentiation through fixed-point layers (deep equilibrium models), and the reparameterization trick in variational inference.

The Inverse Function Theorem

Definition

Local Diffeomorphism

A continuously differentiable map f:RnRnf: \mathbb{R}^n \to \mathbb{R}^n is a local diffeomorphism at aa if there exist open sets UaU \ni a and Vf(a)V \ni f(a) such that f:UVf: U \to V is bijective and both ff and f1f^{-1} are continuously differentiable on their respective domains.

Theorem

Inverse Function Theorem

Statement

Let f:RnRnf: \mathbb{R}^n \to \mathbb{R}^n be continuously differentiable (C1C^1) in a neighborhood of aRna \in \mathbb{R}^n. If the Jacobian matrix Df(a)Df(a) is invertible (i.e., detDf(a)0\det Df(a) \neq 0), then ff is a local diffeomorphism at aa. That is, there exist open sets UaU \ni a and Vf(a)V \ni f(a) such that:

  1. f:UVf: U \to V is a bijection
  2. f1:VUf^{-1}: V \to U is continuously differentiable
  3. D(f1)(f(a))=[Df(a)]1D(f^{-1})(f(a)) = [Df(a)]^{-1}

The derivative of the inverse is the inverse of the derivative.

Intuition

If the linear approximation Df(a)Df(a) is invertible, then the nonlinear function ff behaves like an invertible linear map near aa. You can "undo" ff in a small neighborhood. The key word is local: the function may not be globally invertible (think of f(x)=x2f(x) = x^2 near x=1x = 1, which is locally invertible even though ff is not globally injective).

Proof Sketch

The standard proof uses the contraction mapping theorem. Define g(x)=x[Df(a)]1(f(x)y)g(x) = x - [Df(a)]^{-1}(f(x) - y) and show that gg is a contraction on a small ball around aa when yy is close to f(a)f(a). The Banach fixed-point theorem gives a unique fixed point x=f1(y)x = f^{-1}(y). Differentiability of f1f^{-1} follows from the chain rule applied to f(f1(y))=yf(f^{-1}(y)) = y.

Why It Matters

In ML, this theorem justifies: (1) the change-of-variables formula in normalizing flows, where you need the transformation to be locally invertible, (2) implicit differentiation through nonlinear layers, and (3) the fact that smooth bijective reparameterizations preserve optimization landscapes locally.

Failure Mode

The theorem is silent when Df(a)Df(a) is singular. At such points, the function may fold (like f(x)=x2f(x) = x^2 at x=0x = 0), and there is no local inverse. The theorem also says nothing about the size of the neighborhood UU; it could be extremely small.

The Implicit Function Theorem

Theorem

Implicit Function Theorem

Statement

Let F:Rn×RmRmF: \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}^m be C1C^1 in a neighborhood of (a,b)(a, b) with F(a,b)=0F(a, b) = 0. If the m×mm \times m matrix DyF(a,b)=Fy(a,b)D_y F(a, b) = \frac{\partial F}{\partial y}(a, b) is invertible, then there exist open sets UaU \ni a and W(a,b)W \ni (a, b) and a unique C1C^1 function g:URmg: U \to \mathbb{R}^m such that:

  1. g(a)=bg(a) = b
  2. F(x,g(x))=0F(x, g(x)) = 0 for all xUx \in U
  3. Dg(a)=[DyF(a,b)]1DxF(a,b)Dg(a) = -[D_y F(a,b)]^{-1} D_x F(a,b)

The equation F(x,y)=0F(x, y) = 0 implicitly defines y=g(x)y = g(x) near (a,b)(a, b).

Intuition

If you have a system of mm equations in n+mn + m unknowns and the equations are "non-degenerate" with respect to the last mm variables (the Jacobian in yy is invertible), then you can solve for those mm variables as smooth functions of the remaining nn variables. You do not need an explicit formula; the theorem guarantees the function exists and gives its derivative.

Proof Sketch

Define G(x,y)=(x,F(x,y))G(x, y) = (x, F(x, y)) from Rn+m\mathbb{R}^{n+m} to Rn+m\mathbb{R}^{n+m}. The Jacobian of GG at (a,b)(a, b) is block lower triangular with blocks InI_n and DyF(a,b)D_y F(a, b), so detDG(a,b)=detDyF(a,b)0\det DG(a,b) = \det D_y F(a,b) \neq 0. Apply the inverse function theorem to GG to get a local inverse, then extract gg from it.

Why It Matters

The implicit function theorem underlies: (1) Lagrange multipliers, where constraints g(x)=0g(x) = 0 implicitly define a manifold of feasible points, (2) implicit differentiation in deep equilibrium models (DEQs), where the output zz^* satisfies f(z,x)=zf(z^*, x) = z^* and you differentiate through the fixed-point equation, and (3) the computation of gradients through any layer defined by a fixed-point or root-finding condition.

Failure Mode

The theorem fails when DyF(a,b)D_y F(a, b) is singular. At such points, the solution set of F(x,y)=0F(x, y) = 0 may branch, have cusps, or be locally degenerate. The theorem also provides only local results. The implicit function gg may not extend to a global solution.

Applications in ML

Lagrange Multipliers

To minimize f(x)f(x) subject to h(x)=0h(x) = 0 where h:RnRkh: \mathbb{R}^n \to \mathbb{R}^k, the method of Lagrange multipliers introduces the Lagrangian:

L(x,λ)=f(x)+λTh(x)\mathcal{L}(x, \lambda) = f(x) + \lambda^T h(x)

At a constrained minimum, xL=0\nabla_x \mathcal{L} = 0 and h(x)=0h(x) = 0. The implicit function theorem guarantees that the constraint h(x)=0h(x) = 0 locally defines a (nk)(n-k)-dimensional manifold, provided Dh(x)Dh(x) has full rank.

Deep Equilibrium Models

A deep equilibrium model defines its output zz^* as the fixed point of z=fθ(z,x)z^* = f_\theta(z^*, x). To compute z/θ\partial z^* / \partial \theta, define F(θ,z)=zfθ(z,x)F(\theta, z) = z - f_\theta(z, x). By the implicit function theorem (assuming DzF=IDzfθD_z F = I - D_z f_\theta is invertible):

zθ=[IDzfθ(z,x)]1fθθ(z,x)\frac{\partial z^*}{\partial \theta} = -[I - D_z f_\theta(z^*, x)]^{-1} \frac{\partial f_\theta}{\partial \theta}(z^*, x)

This avoids backpropagating through the entire fixed-point iteration.

Common Confusions

Watch Out

Local does not mean global

Both theorems give local results only. The inverse function theorem says ff is invertible near aa, not everywhere. The map f(x)=eixf(x) = e^{ix} from R\mathbb{R} to the unit circle has invertible derivative everywhere but is not globally injective. Always check whether the local guarantee suffices for your application.

Watch Out

Invertible Jacobian is necessary at the point, not everywhere

The inverse function theorem requires detDf(a)0\det Df(a) \neq 0 at the specific point aa. The Jacobian may be singular at other points. Similarly, the implicit function theorem needs DyFD_y F invertible at the specific point (a,b)(a, b).

Summary

  • Inverse function theorem: if Df(a)Df(a) is invertible, then ff is locally invertible near aa
  • The derivative of the local inverse is [Df(a)]1[Df(a)]^{-1}
  • Implicit function theorem: if F(a,b)=0F(a,b) = 0 and DyF(a,b)D_y F(a,b) is invertible, then y=g(x)y = g(x) near (a,b)(a,b)
  • The implicit derivative is Dg(a)=[DyF(a,b)]1DxF(a,b)Dg(a) = -[D_y F(a,b)]^{-1} D_x F(a,b)
  • Both theorems are local; they say nothing about global invertibility
  • Applications: Lagrange multipliers, normalizing flows, deep equilibrium models

Exercises

ExerciseCore

Problem

Let f:R2R2f: \mathbb{R}^2 \to \mathbb{R}^2 be defined by f(x,y)=(excosy,exsiny)f(x, y) = (e^x \cos y, e^x \sin y). At which points is ff locally invertible? Compute the Jacobian and check its determinant.

ExerciseAdvanced

Problem

The equation x2+y2+z2=1x^2 + y^2 + z^2 = 1 defines the unit sphere. Near which points can you express zz as a C1C^1 function of (x,y)(x, y)? Compute z/x\partial z / \partial x using the implicit function theorem where it applies.

References

Canonical:

  • Rudin, Principles of Mathematical Analysis, Chapter 9 (Theorems 9.24 and 9.28)
  • Spivak, Calculus on Manifolds, Chapter 2

Current:

  • Bai et al., "Deep Equilibrium Models" (NeurIPS 2019). implicit differentiation through fixed points
  • Krantz and Parks, The Implicit Function Theorem (2003). comprehensive treatment

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics