Inverse and Implicit Function Theorem

Sneiderman, Robby

Foundations

Inverse and Implicit Function Theorem

The inverse function theorem guarantees local invertibility when the Jacobian is nonsingular. The implicit function theorem guarantees that constraint surfaces are locally graphs. Both are essential for constrained optimization and implicit layers.

CoreTier 2StableSupporting~40 min

Prerequisites

The Jacobian Matrix

Start 8-question practice · 4 available Prereq Map

Learning position

Read this page in the graph.

foundations | layer 0A | tier 2. This page has 1 direct prerequisite and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Convex Optimization Basics

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

The inverse function theorem tells you when a nonlinear function can be "undone" locally. The implicit function theorem tells you when a system of equations $F(x, y) = 0$ defines $y$ as a function of $x$ , even when you cannot solve for $y$ explicitly. These theorems appear throughout ML: Lagrange multipliers for constrained optimization, implicit differentiation through fixed-point layers (deep equilibrium models), and the reparameterization trick in variational inference.

Inverse and implicit function theorems are local geometry statements about solvability

Invertible Jacobian means the nonlinear map behaves like a reversible linear map nearby.

The Inverse Function Theorem

Definition

Local Diffeomorphism

A continuously differentiable map $f: \mathbb{R}^n \to \mathbb{R}^n$ is a local diffeomorphism at $a$ if and only if there exist open sets $U \ni a$ and $V \ni f(a)$ such that $f: U \to V$ is bijective and both $f$ and $f^{-1}$ are continuously differentiable on their respective domains.

Theorem

Inverse Function Theorem

Statement

Let $f: \mathbb{R}^n \to \mathbb{R}^n$ be continuously differentiable ( $C^1$ ) in a neighborhood of $a \in \mathbb{R}^n$ . If the Jacobian matrix $Df(a)$ is invertible (i.e., $\det Df(a) \neq 0$ ), then $f$ is a local diffeomorphism at $a$ . That is, there exist open sets $U \ni a$ and $V \ni f(a)$ such that:

$f: U \to V$ is a bijection
$f^{-1}: V \to U$ is continuously differentiable
$D(f^{-1})(f(a)) = [Df(a)]^{-1}$

The derivative of the inverse is the inverse of the derivative.

Intuition

If the linear approximation $Df(a)$ is invertible, then the nonlinear function $f$ behaves like an invertible linear map near $a$ . You can "undo" $f$ in a small neighborhood. The key word is local: the function may not be globally invertible (think of $f(x) = x^2$ near $x = 1$ , which is locally invertible even though $f$ is not globally injective).

Proof Sketch

The standard proof uses the contraction mapping theorem. Define $g(x) = x - [Df(a)]^{-1}(f(x) - y)$ and show that $g$ is a contraction on a small ball around $a$ when $y$ is close to $f(a)$ . The Banach fixed-point theorem gives a unique fixed point $x = f^{-1}(y)$ . Differentiability of $f^{-1}$ follows from the chain rule applied to $f(f^{-1}(y)) = y$ .

Why It Matters

In ML, this theorem justifies: (1) the local change-of-variables formula that every normalizing-flow layer relies on. Note that the global density formula used in practice ( $\log p_X(x) = \log p_Z(f(x)) + \log|\det Df(x)|$ ) requires $f$ to be a global diffeomorphism between the data and latent space, or for any non-injective regions to be summed over branches; pure local invertibility is not enough to guarantee a valid normalized density. (2) Implicit differentiation through nonlinear layers, and (3) the fact that smooth bijective reparameterizations preserve optimization landscapes locally.

Failure Mode

The theorem is silent when $Df(a)$ is singular. At such points, the function may fold (like $f(x) = x^2$ at $x = 0$ ), and there is no local inverse. The theorem also says nothing about the size of the neighborhood $U$ ; it could be extremely small.

report a correction →

The Implicit Function Theorem

Theorem

Implicit Function Theorem

Statement

Let $F: \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}^m$ be $C^1$ in a neighborhood of $(a, b)$ with $F(a, b) = 0$ . If the $m \times m$ matrix $D_y F(a, b) = \frac{\partial F}{\partial y}(a, b)$ is invertible, then there exist open sets $U \ni a$ and $W \ni (a, b)$ and a unique $C^1$ function $g: U \to \mathbb{R}^m$ such that:

$g(a) = b$
$F(x, g(x)) = 0$ for all $x \in U$
$Dg(a) = -[D_y F(a,b)]^{-1} D_x F(a,b)$

The equation $F(x, y) = 0$ implicitly defines $y = g(x)$ near $(a, b)$ .

Intuition

If you have a system of $m$ equations in $n + m$ unknowns and the equations are "non-degenerate" with respect to the last $m$ variables (the Jacobian in $y$ is invertible), then you can solve for those $m$ variables as smooth functions of the remaining $n$ variables. You do not need an explicit formula; the theorem guarantees the function exists and gives its derivative.

Proof Sketch

Define $G(x, y) = (x, F(x, y))$ from $\mathbb{R}^{n+m}$ to $\mathbb{R}^{n+m}$ . The Jacobian of $G$ at $(a, b)$ is block lower triangular with blocks $I_n$ and $D_y F(a, b)$ , so $\det DG(a,b) = \det D_y F(a,b) \neq 0$ . Apply the inverse function theorem to $G$ to get a local inverse, then extract $g$ from it.

Why It Matters

The implicit function theorem underlies: (1) Lagrange multipliers, where constraints $g(x) = 0$ implicitly define a manifold of feasible points, (2) implicit differentiation in deep equilibrium models (DEQs), where the output $z^*$ satisfies $f(z^*, x) = z^*$ and you differentiate through the fixed-point equation, and (3) the computation of gradients through any layer defined by a fixed-point or root-finding condition.

Failure Mode

The theorem fails when $D_y F(a, b)$ is singular. At such points, the solution set of $F(x, y) = 0$ may branch, have cusps, or be locally degenerate. The theorem also provides only local results. The implicit function $g$ may not extend to a global solution.

report a correction →

Applications in ML

Lagrange Multipliers

To minimize $f(x)$ subject to $h(x) = 0$ where $h: \mathbb{R}^n \to \mathbb{R}^k$ , the method of Lagrange multipliers introduces the Lagrangian:

$\mathcal{L}(x, \lambda) = f(x) + \lambda^T h(x)$

At a constrained minimum, $\nabla_x \mathcal{L} = 0$ and $h(x) = 0$ . The implicit function theorem guarantees that the constraint $h(x) = 0$ locally defines a $(n-k)$ -dimensional manifold, provided $Dh(x)$ has full rank.

Deep Equilibrium Models

A deep equilibrium model defines its output $z^*$ as the fixed point of $z^* = f_\theta(z^*, x)$ . To compute $\partial z^* / \partial \theta$ , define $F(\theta, z) = z - f_\theta(z, x)$ . Differentiating $F(\theta, z^*(\theta)) = 0$ in $\theta$ gives $\partial F / \partial \theta + (\partial F / \partial z) \cdot \partial z^* / \partial \theta = 0$ , where $\partial F / \partial \theta = -\partial f_\theta / \partial \theta$ and $\partial F / \partial z = I - D_z f_\theta$ . The two minus signs cancel, so by the implicit function theorem (assuming $I - D_z f_\theta$ is invertible):

$\frac{\partial z^*}{\partial \theta} = [I - D_z f_\theta(z^*, x)]^{-1} \frac{\partial f_\theta}{\partial \theta}(z^*, x)$

This avoids backpropagating through the entire fixed-point iteration. The opposite sign convention $F(\theta, z) = f_\theta(z, x) - z$ flips the sign of both partials and yields the same expression with a leading minus, so the final formula must be written explicitly together with the sign convention for $F$ .

Common Confusions

Watch Out

Local does not mean global

Both theorems give local results only. The inverse function theorem says $f$ is invertible near $a$ , not everywhere. The map $f(x) = e^{ix}$ from $\mathbb{R}$ to the unit circle has invertible derivative everywhere but is not globally injective. Always check whether the local guarantee suffices for your application.

Watch Out

Invertible Jacobian is necessary at the point, not everywhere

The inverse function theorem requires $\det Df(a) \neq 0$ at the specific point $a$ . The Jacobian may be singular at other points. Similarly, the implicit function theorem needs $D_y F$ invertible at the specific point $(a, b)$ .

Summary

Inverse function theorem: if $Df(a)$ is invertible, then $f$ is locally invertible near $a$
The derivative of the local inverse is $[Df(a)]^{-1}$
Implicit function theorem: if $F(a,b) = 0$ and $D_y F(a,b)$ is invertible, then $y = g(x)$ near $(a,b)$
The implicit derivative is $Dg(a) = -[D_y F(a,b)]^{-1} D_x F(a,b)$
Both theorems are local; they say nothing about global invertibility
Applications: Lagrange multipliers, normalizing flows, deep equilibrium models

Exercises

ExerciseCore

Problem

Let $f: \mathbb{R}^2 \to \mathbb{R}^2$ be defined by $f(x, y) = (e^x \cos y, e^x \sin y)$ . At which points is $f$ locally invertible? Compute the Jacobian and check its determinant.

ExerciseAdvanced

Problem

The equation $x^2 + y^2 + z^2 = 1$ defines the unit sphere. Near which points can you express $z$ as a $C^1$ function of $(x, y)$ ? Compute $\partial z / \partial x$ using the implicit function theorem where it applies.

References

Canonical:

Rudin, Principles of Mathematical Analysis, Chapter 9 (Theorems 9.24 and 9.28)
Spivak, Calculus on Manifolds, Chapter 2

Current:

Bai et al., "Deep Equilibrium Models" (NeurIPS 2019). implicit differentiation through fixed points
Krantz and Parks, The Implicit Function Theorem (2003). comprehensive treatment

Next Topics

Convex optimization basics: constrained optimization using Lagrange multipliers
Convex duality: the role of implicit functions in duality theory

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

The Jacobian Matrixlayer 0A · tier 1

Derived topics

2

Convex Optimization Basicslayer 1 · tier 1
Convex Dualitylayer 2 · tier 1

Graph-backed continuations

Convex Optimization Basics Convex Duality