Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Beyond Llms

Equivariant Deep Learning

Networks that respect symmetry: if the input transforms under a group action, the output transforms predictably. Equivariance generalizes translation equivariance in CNNs to rotations, permutations, and gauge symmetries, reducing sample complexity and improving generalization on structured data.

AdvancedTier 2Current~50 min
0

Why This Matters

A CNN detects a cat whether it appears on the left or right of the image. This is translation equivariance: shifting the input shifts the feature maps by the same amount. The CNN does not need to learn the cat pattern separately for each position because weight sharing enforces the symmetry.

Equivariant deep learning generalizes this idea to arbitrary symmetries. If your data has rotational symmetry (molecular structures, satellite imagery), permutation symmetry (sets, graphs, point clouds), or gauge symmetry (physical fields), you can build networks that respect these symmetries by construction. The payoff: fewer parameters, less training data, better generalization.

This is the core idea of geometric deep learning (Bronstein et al., 2021): most successful architectures can be understood as equivariant networks for specific symmetry groups.

Core Definitions

Definition

Group Action

A group GG acts on a space X\mathcal{X} through a map ρ:G×XX\rho: G \times \mathcal{X} \to \mathcal{X} satisfying ρ(e,x)=x\rho(e, x) = x (identity) and ρ(g1,ρ(g2,x))=ρ(g1g2,x)\rho(g_1, \rho(g_2, x)) = \rho(g_1 g_2, x) (composition). Examples: translation group (R2,+)(\mathbb{R}^2, +) acts on images by shifting. Rotation group SO(3)SO(3) acts on 3D point clouds by rotating. Permutation group SnS_n acts on sets by reordering.

Proposition

Equivariance

Statement

A function f:XYf: \mathcal{X} \to \mathcal{Y} is equivariant with respect to group GG if:

f(ρX(g,x))=ρY(g,f(x))gG,  xXf(\rho_X(g, x)) = \rho_Y(g, f(x)) \quad \forall g \in G, \; x \in \mathcal{X}

Transforming the input, then applying ff, gives the same result as applying ff, then transforming the output. The function "commutes" with the group action.

Invariance is the special case where ρY\rho_Y is trivial: f(ρX(g,x))=f(x)f(\rho_X(g, x)) = f(x) for all gg. The output does not change at all.

Intuition

An equivariant function preserves the structure of transformations. If you rotate a molecule 90 degrees and then predict its energy, you should get the same energy as if you first predict and then (conceptually) rotate. If you rotate it and predict its dipole moment, the dipole should rotate by the same 90 degrees.

Invariance (energy does not change under rotation) and equivariance (dipole rotates with the molecule) are both useful, and which one you want depends on what you are predicting.

Why It Matters

Equivariance is a hard constraint, not a soft regularizer. A network that is equivariant by construction will respect the symmetry perfectly on all inputs, not just approximately on training data. This is a strict generalization guarantee: the network cannot learn to violate the symmetry, even with adversarial data. This is why equivariant networks need dramatically less data than unconstrained networks for tasks with known symmetries.

Failure Mode

The symmetry must be exact. If your data has approximate symmetry (e.g., images are roughly but not exactly rotation-invariant because of gravity), enforcing exact equivariance can hurt. The network cannot learn that "up" and "down" are different if you force rotational invariance. In such cases, data augmentation (soft symmetry) may outperform equivariant architectures (hard symmetry).

Why Equivariance Reduces Parameters

Theorem

Equivariance Implies Weight Sharing

Statement

A linear map W:RdinRdoutW: \mathbb{R}^{d_{\text{in}}} \to \mathbb{R}^{d_{\text{out}}} that is equivariant with respect to representations ρin\rho_{\text{in}} and ρout\rho_{\text{out}} of GG satisfies:

Wρin(g)=ρout(g)WgGW \rho_{\text{in}}(g) = \rho_{\text{out}}(g) W \quad \forall g \in G

The set of matrices satisfying this constraint is a linear subspace of Rdout×din\mathbb{R}^{d_{\text{out}} \times d_{\text{in}}}. The dimension of this subspace (the number of free parameters) is at most dindout/Gd_{\text{in}} d_{\text{out}} / |G| for a finite group GG.

Intuition

The equivariance constraint forces parameter sharing. In a CNN, translation equivariance forces the same filter weights at every position, reducing parameters from O(image size×filter size)O(\text{image size} \times \text{filter size}) to O(filter size)O(\text{filter size}). For rotation equivariance, the constraint forces the filter to be "steerable" (a linear combination of a fixed set of basis filters), further reducing parameters.

Fewer free parameters means the function class is smaller, which improves generalization via the bias-variance tradeoff. The bias is increased (you cannot represent symmetry-breaking functions), but the variance decreases (less overfitting) by exactly the right amount when the symmetry holds.

Why It Matters

This is why equivariant networks work with less data: the parameter sharing from equivariance is not arbitrary compression, it is compression that perfectly matches the data symmetry. The number of parameters scales inversely with the group size, so larger symmetry groups give more compression. A G=8|G| = 8 rotation group (discrete rotations of a square) gives 8×8\times fewer parameters. A continuous rotation group gives infinite compression (parameters depend only on the radial profile, not the angle).

Failure Mode

Computing the equivariant subspace requires solving the intertwiner condition Wρin(g)=ρout(g)WW\rho_{\text{in}}(g) = \rho_{\text{out}}(g)W for all gg, which requires knowledge of the group representations. For simple groups (translations, rotations, permutations), the representations are well-known. For complex or non-standard symmetries, finding the representations is a research problem in itself.

Architectures as Equivariant Networks

ArchitectureSymmetry groupEquivariance typeDomain
CNNTranslation (Z2,+)(\mathbb{Z}^2, +)Feature maps shift with inputImages
GNNPermutation SnS_nOutput permutes with node reorderingGraphs
TransformerPermutation SnS_n (on tokens)Equivariant (with positional encoding: breaks symmetry)Sequences
Steerable CNNRotation SO(2)SO(2) or O(2)O(2)Feature maps rotate with inputOriented images
SE(3)-TransformerRotation + translation SE(3)SE(3)Equivariant on 3D coordinatesMolecules, proteins
SchNet / DimeNetE(3)E(3) (Euclidean group)Invariant predictions, equivariant internal featuresMolecular dynamics
DeepSetsPermutation SnS_nInvariant to set element orderingPoint clouds, sets

Common Confusions

Watch Out

Equivariance and invariance are different

Invariance means the output does not change under the group action (f(gx)=f(x)f(gx) = f(x)). Equivariance means the output transforms predictably (f(gx)=gf(x)f(gx) = gf(x)). Predicting molecular energy should be invariant to rotation. Predicting molecular forces should be equivariant (forces rotate with the molecule). Using the wrong one is a modeling error, not just a terminology issue.

Watch Out

Data augmentation is not the same as equivariance

Data augmentation (training on rotated/flipped copies of the data) encourages the network to learn approximate equivariance from data. An equivariant architecture enforces exact equivariance by construction. Augmentation needs more data and may not generalize to unseen transformations. Equivariance guarantees the symmetry holds everywhere. The tradeoff: augmentation is more flexible (works with approximate symmetries), equivariance is more efficient (works with exact symmetries).

Exercises

ExerciseCore

Problem

A function f:RnRf: \mathbb{R}^n \to \mathbb{R} is invariant to the permutation group SnS_n (any reordering of the input coordinates gives the same output). Give three examples of such functions and one example of a function that is not permutation-invariant.

ExerciseAdvanced

Problem

Explain why a standard MLP (fully connected network) is not equivariant to any non-trivial group action on its inputs, while a CNN IS equivariant to translations. What structural property of the CNN enforces this?

References

Canonical:

  • Cohen & Welling, "Group Equivariant Convolutional Networks" (ICML 2016). The foundational paper.
  • Bronstein et al., "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges" (2021). The unifying survey.

Current:

  • Weiler & Cesa, "General E(2)-Equivariant Steerable CNNs" (NeurIPS 2019)
  • Batzner et al., "E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials" (Nature Communications, 2022)
  • Zaheer et al., "Deep Sets" (NeurIPS 2017). Permutation invariance.

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics