Equivariant Deep Learning

Sneiderman, Robby

Beyond LLMS

Equivariant Deep Learning

Networks that respect symmetry: if the input transforms under a group action, the output transforms predictably. Equivariance generalizes translation equivariance in CNNs to rotations, permutations, and gauge symmetries, reducing sample complexity and improving generalization on structured data.

AdvancedTier 2CurrentSupporting~50 min

Prerequisites

Convolutional Neural Networks Graph Neural Networks Attention for Protein Structure Alphafold

Prereq Map

Learning position

Read this page in the graph.

beyond-llms | layer 4 | tier 2. This page has 3 direct prerequisites and 3 published dependents.

Open Atlas Prerequisites Leads to

What next

Riemannian Optimization and Manifold Constraints

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

A CNN detects a cat whether it appears on the left or right of the image. This is translation equivariance: shifting the input shifts the feature maps by the same amount. The CNN does not need to learn the cat pattern separately for each position because weight sharing enforces the symmetry.

Equivariant deep learning generalizes this idea to arbitrary symmetries. If your data has rotational symmetry (molecular structures, satellite imagery), permutation symmetry (sets, graphs, point clouds), or gauge symmetry (physical fields), you can build networks that respect these symmetries by construction. The payoff: fewer parameters, less training data, better generalization.

This is the core idea of geometric deep learning (Bronstein et al., 2021): most successful architectures can be understood as equivariant networks for specific symmetry groups.

Core Definitions

Definition

Group Action

A group $G$ acts on a space $\mathcal{X}$ through a map $\rho: G \times \mathcal{X} \to \mathcal{X}$ satisfying $\rho(e, x) = x$ (identity) and $\rho(g_1, \rho(g_2, x)) = \rho(g_1 g_2, x)$ (composition). Examples: translation group $(\mathbb{R}^2, +)$ acts on images by shifting. Rotation group $SO(3)$ acts on 3D point clouds by rotating. Permutation group $S_n$ acts on sets by reordering.

Proposition

Equivariance

Statement

A function $f: \mathcal{X} \to \mathcal{Y}$ is equivariant with respect to group $G$ if and only if:

$f(\rho_X(g, x)) = \rho_Y(g, f(x)) \quad \forall g \in G, \; x \in \mathcal{X}$

Transforming the input, then applying $f$ , gives the same result as applying $f$ , then transforming the output. The function "commutes" with the group action.

Invariance is the special case where $\rho_Y$ is trivial: $f(\rho_X(g, x)) = f(x)$ for all $g$ . The output does not change at all.

Intuition

An equivariant function preserves the structure of transformations. If you rotate a molecule 90 degrees and then predict its energy, you should get the same energy as if you first predict and then (conceptually) rotate. If you rotate it and predict its dipole moment, the dipole should rotate by the same 90 degrees.

Invariance (energy does not change under rotation) and equivariance (dipole rotates with the molecule) are both useful, and which one you want depends on what you are predicting.

Why It Matters

Equivariance is a hard constraint, not a soft regularizer. A network that is equivariant by construction will respect the symmetry perfectly on all inputs, not just approximately on training data. This is a strict generalization guarantee: the network cannot learn to violate the symmetry, even with adversarial data. This is why equivariant networks need dramatically less data than unconstrained networks for tasks with known symmetries.

Failure Mode

The symmetry must be exact. If your data has approximate symmetry (e.g., images are roughly but not exactly rotation-invariant because of gravity), enforcing exact equivariance can hurt. The network cannot learn that "up" and "down" are different if you force rotational invariance. In such cases, data augmentation (soft symmetry) may outperform equivariant architectures (hard symmetry).

report a correction →

Why Equivariance Reduces Parameters

Theorem

Equivariance Implies Weight Sharing

Statement

A linear map $W: \mathbb{R}^{d_{\text{in}}} \to \mathbb{R}^{d_{\text{out}}}$ that is equivariant with respect to representations $\rho_{\text{in}}$ and $\rho_{\text{out}}$ of $G$ satisfies:

$W \rho_{\text{in}}(g) = \rho_{\text{out}}(g) W \quad \forall g \in G$

The set of such intertwiners is a linear subspace of $\mathbb{R}^{d_{\text{out}} \times d_{\text{in}}}$ . Its dimension is determined by Schur's lemma applied to the irreducible decomposition of $\rho_{\text{in}}$ and $\rho_{\text{out}}$ : it equals $\sum_\rho m^{\text{out}}_\rho \, m^{\text{in}}_\rho$ over irreps $\rho$ shared by the two representations, where $m_\rho$ are the multiplicities. The dimension depends on the representations, not on $|G|$ alone — equivariance reduces the parameter count when irreps are mismatched, and the reduction can be much larger or much smaller than the naive $d_{\text{in}} d_{\text{out}} / |G|$ heuristic.

Intuition

The equivariance constraint forces parameter sharing. In a CNN, translation equivariance forces the same filter weights at every position, reducing parameters from $O(\text{image size} \times \text{filter size})$ to $O(\text{filter size})$ . For rotation equivariance, the constraint forces the filter to be "steerable" (a linear combination of a fixed set of basis filters), further reducing parameters.

Fewer free parameters means the function class is smaller, which improves generalization via the bias-variance tradeoff. The bias is increased (you cannot represent symmetry-breaking functions), but the variance decreases (less overfitting) by exactly the right amount when the symmetry holds.

Why It Matters

This is why equivariant networks work with less data: the parameter sharing from equivariance is not arbitrary compression, it is compression that matches the data symmetry. The amount of compression depends on how the input and output representations decompose into irreps; it is not a clean $1/|G|$ factor. Common cases give large reductions in practice — a $|G| = 8$ rotation group acting via the regular representation reduces parameters by roughly $8\times$ , and a continuous rotation group restricts a planar filter to its radial profile — but the trivial representation gives no reduction at all, so the right way to think about equivariance is "matching the irrep structure," not "dividing by group size."

Failure Mode

Computing the equivariant subspace requires solving the intertwiner condition $W\rho_{\text{in}}(g) = \rho_{\text{out}}(g)W$ for all $g$ , which requires knowledge of the group representations. For simple groups (translations, rotations, permutations), the representations are well-known. For complex or non-standard symmetries, finding the representations is a research problem in itself.

report a correction →

Architectures as Equivariant Networks

Architecture	Symmetry group	Equivariance type	Domain
CNN	Translation $(\mathbb{Z}^2, +)$	Feature maps shift with input	Images
GNN	Permutation $S_n$	Output permutes with node reordering	Graphs
Transformer	Permutation $S_n$ (on tokens)	Equivariant (with positional encoding: breaks symmetry)	Sequences
Steerable CNN	Rotation $SO(2)$ or $O(2)$	Feature maps rotate with input	Oriented images
SE(3)-Transformer	Rotation + translation $SE(3)$	Equivariant on 3D coordinates	Molecules, proteins
SchNet / DimeNet	$E(3)$ (Euclidean group)	Invariant predictions, equivariant internal features	Molecular dynamics
DeepSets	Permutation $S_n$	Invariant to set element ordering	Point clouds, sets

Common Confusions

Watch Out

Equivariance and invariance are different

Invariance means the output does not change under the group action ( $f(gx) = f(x)$ ). Equivariance means the output transforms predictably ( $f(gx) = gf(x)$ ). Predicting molecular energy should be invariant to rotation. Predicting molecular forces should be equivariant (forces rotate with the molecule). Using the wrong one is a modeling error, not just a terminology issue.

Watch Out

Data augmentation is not the same as equivariance

Data augmentation (training on rotated/flipped copies of the data) encourages the network to learn approximate equivariance from data. An equivariant architecture enforces exact equivariance by construction. Augmentation needs more data and may not generalize to unseen transformations. Equivariance guarantees the symmetry holds everywhere. The tradeoff: augmentation is more flexible (works with approximate symmetries), equivariance is more efficient (works with exact symmetries).

Exercises

ExerciseCore

Problem

A function $f: \mathbb{R}^n \to \mathbb{R}$ is invariant to the permutation group $S_n$ (any reordering of the input coordinates gives the same output). Give three examples of such functions and one example of a function that is not permutation-invariant.

ExerciseAdvanced

Problem

Explain why a standard MLP (fully connected network) is not equivariant to any non-trivial group action on its inputs, while a CNN IS equivariant to translations. What structural property of the CNN enforces this?

References

Canonical:

Cohen & Welling, "Group Equivariant Convolutional Networks" (ICML 2016). The foundational paper. arXiv:1602.07576
Bronstein, Bruna, Cohen, Velickovic, "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges" (2021). The unifying survey: every later equivariant-architecture paper cites this framing. arXiv:2104.13478
Cohen & Welling, "Steerable CNNs" (ICLR 2017). The irrep-decomposition framework that justifies the "matching irrep structure" view of weight sharing. arXiv:1612.08498
Thomas, Smidt, Kearnes, Yang, Li, Kohlhoff, Riley, "Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds" (2018). The first $SE(3)$ -equivariant point-cloud network, built from spherical harmonics. arXiv:1802.08219
Fuchs, Worrall, Fischer, Welling, "SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks" (NeurIPS 2020). Equivariant self-attention via Clebsch-Gordan tensor products. arXiv:2006.10503
Satorras, Hoogeboom, Welling, "E(n) Equivariant Graph Neural Networks" (ICML 2021). The simplest scalar-only $E(n)$ -equivariant message passing; used widely as a strong baseline. arXiv:2102.09844
Cohen, Geiger, Köhler, Welling, "Spherical CNNs" (ICLR 2018). $SO(3)$ -equivariant convolutions on the sphere via generalized Fourier analysis. arXiv:1801.10130
Maron, Ben-Hamu, Shamir, Lipman, "Invariant and Equivariant Graph Networks" (ICLR 2019). Universal approximation results for permutation-equivariant networks on graphs. arXiv:1812.09902
Kondor, Trivedi, "On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups" (ICML 2018). Proves convolution is equivalent to equivariance for compact groups. arXiv:1802.03690

Current:

Weiler & Cesa, "General E(2)-Equivariant Steerable CNNs" (NeurIPS 2019). arXiv:1911.08251
Batzner et al., "E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials" (Nat. Commun. 2022). NequIP. arXiv:2101.03164
Zaheer et al., "Deep Sets" (NeurIPS 2017). Permutation invariance and the $\rho(\sum \phi(x_i))$ universal architecture. arXiv:1703.06114
Liao, Smidt, "EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations" (ICLR 2024). Current-best equivariant transformer for OC20-style catalysis benchmarks. arXiv:2306.12059

Next Topics

Riemannian optimization: optimization on manifolds where equivariance constraints define the geometry
Representation learning: how learned representations encode (or fail to encode) data symmetries

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Convolutional Neural Networkslayer 3 · tier 2
Graph Neural Networkslayer 3 · tier 2
Attention for Protein Structure: AlphaFold and Successorslayer 4 · tier 3

Derived topics

3

Representation Learning Theorylayer 3 · tier 2
Riemannian Optimization and Manifold Constraintslayer 3 · tier 2
Graph Neural Networks for Moleculeslayer 4 · tier 3

Graph-backed continuations

Riemannian Optimization and Manifold Constraints Representation Learning Theory Graph Neural Networks for Molecules