ML Methods
Hyperbolic Embeddings for Graphs
Negatively curved space embeds tree-like graphs with exponentially less distortion than Euclidean space at the same dimension. The Poincare ball model, Mobius gyrovector arithmetic, and Sala et al.'s sharp distortion bound make this concrete; Ganea et al. lift the construction to neural network layers.
Prerequisites
Why This Matters
A perfect binary tree of depth has nodes, so the number of nodes within graph distance of the root grows like . A Euclidean ball of radius in has volume , polynomial in . To fit nodes into a polynomial volume while preserving distances, an embedding must compress: tree leaves get shoved together and pairwise distances are crushed. This is not a tuning issue. It is a volume mismatch between the data and the space.
Hyperbolic space has volume that grows like (see non-Euclidean and hyperbolic geometry). The exponential matches the tree. Sala, De Sa, Gu, and Re (2018) made this quantitative: any tree on nodes embeds into the Poincare disk with distortion using bits of precision, while the best Euclidean embedding into any fixed dimension has distortion for some trees.
This was not folklore. Nickel and Kiela (2017) reported 5-dimensional Poincare embeddings of WordNet hypernymy beating 200-dimensional Euclidean embeddings on link prediction. Ganea, Becigneul, and Hofmann (2018) then showed how to build neural-network layers (linear maps, biases, activations, attention) that operate inside rather than around it.
Setup: the Poincare ball
Poincare ball model
The Poincare ball is the open unit ball
equipped with the Riemannian metric
The induced distance between is
The boundary is at infinite distance from any interior point. The model has constant sectional curvature .
Two facts make usable as an embedding space:
- Geodesics through the origin are straight lines. The geodesic from to a point is the Euclidean segment, with hyperbolic arclength .
- Other geodesics are circular arcs orthogonal to the boundary . Picture: a "straight line" curves toward the center because pushing along the boundary is metrically expensive (the conformal factor as ).
The distortion bound for tree embeddings
The central result. Distortion of an embedding is the multiplicative gap between graph and embedded distances:
A perfect (isometric) embedding has distortion .
Sala-De Sa-Gu-Re distortion bound for tree embeddings
Intuition
Embed the root at the origin. For each node already placed, reserve an angular cone for each child and place the child at hyperbolic distance equal to the edge weight along the bisector of its cone. Because the circumference of a hyperbolic circle of radius scales like , the cone available to each child is exponentially larger than the cone of the parent, so even a high-branching tree fits without children colliding. Scaling the whole embedding by a large pushes nodes outward where there is exponential room, controlling angular interference.
Proof Sketch
Embedding construction (Sala et al., Algorithm 1). Place the root at . Process nodes in BFS order. For a node at hyperbolic distance from the origin with children, partition the angular cone at (the minus the angle back to its parent) into sectors. Place child at hyperbolic distance along the bisector of its sector, where is the edge weight.
Why the construction has low distortion. Two nodes in have tree distance , where is their lowest common ancestor. After scaling by , the embedded points satisfy , where is the max degree. The negative correction term is the angular wiggle from approximating "go up then down" by a hyperbolic geodesic. It is bounded uniformly in , so for large enough relative to , every pairwise ratio lies in .
Why precision bits is needed. Hyperbolic distances near the boundary saturate floating-point precision. A point at hyperbolic distance has Euclidean norm , so distinguishing two points at hyperbolic distance apart requires bits to keep from rounding to zero. Scaling pushes typical embedded distances to , hence the bound. Sala et al. give an exact ulp-level analysis.
Euclidean lower bound. Bourgain (1985) and Linial-London-Rabinovich (1995) show any -point metric embeds into with distortion , and this is tight: the complete binary tree (or random regular expander) requires distortion in any Euclidean space. So scaling dimension at fixed distortion does not save Euclidean embeddings of trees: distortion is unavoidable in for every .
Why It Matters
This is the only result that cleanly justifies "use hyperbolic space for hierarchies." The constructive direction tells you the minimum dimension and precision needed. The Euclidean lower bound tells you that no clever Euclidean parametrization can match it asymptotically. Together they explain Nickel and Kiela's 5D-vs-200D result without appealing to mystery: the Euclidean baseline was paying distortion, which a 200-dimensional bag-of-features partially absorbed but did not remove.
Failure Mode
The theorem is about trees and tree-like graphs (low Gromov hyperbolicity ). Grids, lattices, and dense graphs have and gain nothing. The same construction applied to a 2D grid embeds it with distortion , much worse than its trivial Euclidean embedding. Always estimate on a sample of 4-tuples before committing to a hyperbolic embedding (Adcock-Sullivan-Mahoney 2013 give a sampling estimator).
Mobius gyrovector arithmetic
To build neural networks in you need an analogue of vector addition that respects the manifold. Ungar's (2008) gyrovector formalism provides one.
Mobius addition on the Poincare ball
For , the Mobius sum is
It is not associative or commutative, but it is a smooth left-loop operation: , , and the map is a smooth bijection . The inverse is .
Exponential and logarithmic maps at the origin
The exponential map at the origin is
and the logarithm is its inverse:
takes a tangent vector at and returns the point in reached by moving along the geodesic from in that direction for time equal to the tangent vector's length.
A Ganea-style hyperbolic linear layer is then
where and . The construction reads as: lift to the tangent space at the origin, apply a Euclidean linear map, project back to the manifold, then translate by Mobius addition. Hyperbolic activations, attention scores, and MLR classifiers follow the same lift-act-project pattern.
Hyperbolic distance is not a rescaling of Euclidean distance
The Poincare distance formula is not just scaled by a function of position. Two points with small Euclidean separation near the boundary can be very far apart in because as . This is why hyperbolic embeddings have capacity: the boundary is at infinity, so there is always more room to push deeper subtrees outward.
Hyperbolic neural networks are not Euclidean networks with a tanh squash
The output of lies in , so a Euclidean MLP with a final tanh produces points in as a set. That is not a hyperbolic neural network. The intermediate operations (matrix multiplication, addition, normalization) treat the points as Euclidean and ignore the metric. A hyperbolic network uses Mobius operations and Riemannian gradients computed via and , so each parameter step respects the geometry. Treating points as Euclidean, then re-projecting after each step, loses exactly the structure the model was supposed to exploit.
Curvature is a hyperparameter, not a fixed truth
Most papers set curvature for convenience. Curvature scales distances and gradients in ways that interact with optimization and numerical precision. Chami et al. (2019) make learnable per layer; the learned values often drift toward zero on graphs with low Gromov but high local clustering, indicating that the model wants Euclidean geometry in places. If your model insists on everywhere, that is evidence the dataset is not actually tree-like.
When hyperbolic embeddings help
The right diagnostic is Gromov hyperbolicity , the smallest constant such that every geodesic triangle is -thin (each side lies within of the union of the other two). Trees have ; trees with extra cross-edges have small ; expanders and 2D grids have .
Empirically:
| Graph family | Typical | Hyperbolic gain over Euclidean at fixed dim |
|---|---|---|
| WordNet hypernymy, Gene Ontology, citation chains | Large; Nickel-Kiela report 30%+ MAP improvements at 5-50 dim | |
| Social networks (Facebook, citation networks) | to | Moderate; Chami et al. show small but consistent gains |
| Web graphs, biological PPI | to | Marginal; depends on subgraph and task |
| Road networks, 2D meshes, regular grids | None or negative; Euclidean is the right choice | |
| Random Erdos-Renyi | None for typical |
The takeaway: estimate before committing. It is a one-pass sampling computation.
Worked Example: a 7-node binary tree in
Take the perfect binary tree with root , children , grandchildren for . All edges have weight .
Place . Place the two children on opposite rays at hyperbolic distance :
For , the angular cone facing away from is the right half-plane. Split it into two sectors and place the grandchildren on the bisectors at hyperbolic distance from . This requires Mobius translation: , and similarly for .
The hyperbolic distance comes out to roughly , and the tree distance is . With scale factor , the same construction gives versus tree distance , distortion . Pushing higher pushes the ratio further toward , illustrating the construction in the proof sketch.
Problem
Verify directly that the Poincare ball model has constant curvature at the origin. Specifically: compute the geodesic distance from to a point along the Euclidean segment, and show it equals . Then compute the circumference of the hyperbolic circle of radius centered at and show it equals , the signature of curvature .
Problem
Prove the Euclidean lower bound for the path graph: any embedding of the unweighted path on nodes into Euclidean space has distortion , but any embedding of the complete binary tree on nodes has distortion .
Then explain why "complete binary tree" is the right witness: what specific feature of defeats Euclidean embedding that paths and stars do not have?
Problem
Implement Mobius addition in numpy and verify three properties on random points in :
- Left identity: .
- Left inverse: .
- Non-associativity: there exist such that .
Then describe what the "gyration" corrects for and where it appears in the Ganea hyperbolic-MLP code (in scalar matrix multiplication versus pointwise activation).
References
Related Topics
- Non-Euclidean and Hyperbolic Geometry — the geometric grounder this page builds on
- Distance Metrics Compared — where hyperbolic distance fits among other metric choices
- Metric Spaces, Convergence, Completeness
- Riemannian Optimization — the optimizer side of hyperbolic NNs
- Graph Neural Networks — the Euclidean baseline
- Representation Learning Theory
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Non-Euclidean and Hyperbolic GeometryLayer 1
- Metric Spaces, Convergence, and CompletenessLayer 0A
- Vectors, Matrices, and Linear MapsLayer 0A
- Graph Neural NetworksLayer 3
- Convolutional Neural NetworksLayer 3
- Feedforward Networks and BackpropagationLayer 2
- Differentiation in RnLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Matrix CalculusLayer 1
- The Jacobian MatrixLayer 0A
- The Hessian MatrixLayer 0A
- Matrix Operations and PropertiesLayer 0A
- Eigenvalues and EigenvectorsLayer 0A
- Activation FunctionsLayer 1
- Convex Optimization BasicsLayer 1