ReLU Family Lab

Compare hard hinges, rescue slopes, smooth gates, and rounded thresholds. This lab is about the real tradeoff: what happens to negative evidence and what gradient survives the trip back.

Why activations matter

The activation chooses what a neuron does with negative evidence and what gradient survives the trip back

ReLU-family choices are not cosmetic. They decide whether negative pre-activations die, leak, saturate smoothly, or get softly weighted. That changes gradient flow, sparsity, numerical behavior, and which architectures feel stable in practice.

What to watch as you drag

Negative side

Does the unit die, leak, saturate, or keep a soft weighted gradient?

Hinge region

This is the real comparison zone: hard threshold versus smooth shoulder versus probabilistic gate.

Derivative panel

The lower plot tells you whether learning signal survives, not just whether the output looks nice.

Activation board

ReLU keeps the active side brutally simple and kills the negative side completely

ReLU became the default fix for old sigmoid stacks because positive activations keep slope 1 instead of saturating. The cost is that negative pre-activations can go entirely silent.

Active formula

ReLU (x) = max (0, x)

ReLU^{'} (x) = {0, 1, x < 0 x > 0

Where you see it

Classic CNNs, MLP baselines, many residual blocks, and any place you want a plain fast hinge before trying fancier modern variants.

Output

1.100

Local slope

1.000

Gradient health

healthy

The local derivative is close to 1, so learning can flow through this unit cleanly.

Use case

hard zero floor

Pre-activation input1.10

Negative side

Where ReLU dies, Leaky ReLU rescues, ELU saturates smoothly, and GELU softly suppresses.

Hinge region

The teaching zone: here you feel the difference between a hard threshold, a smooth shoulder, and a soft probabilistic gate.

Positive side

Most families approach a healthy slope near 1 here. The real question is how brutally they behave when input falls left of zero.

ReLU: hard zero floorLeaky ReLU: negative slope stays aliveELU: smooth negative saturationGELU: probabilistic gateSoftplus: temperature-controlled hinge

Controls under the board

Choose the family

Jump to teaching scenarios

Current read

Family

ReLU

Regime

active unit

The local derivative is close to 1, so learning can flow through this unit cleanly.

Family snapshot

ReLU is the fastest baseline because its geometry is easy: off on the left, perfectly linear on the right.

hard zero floor

Diagnosis

Active highway

Once the input is positive, ReLU stops being subtle: the unit is just a line of slope 1.

Try next

Compare this to ELU, GELU, and Softplus to feel how much geometry you trade away for that simplicity.

ML translation

The ReLU positive branch preserves gradient magnitude exactly, which is why deep ReLU networks trained so much better than deep sigmoid stacks.

Why this family exists

This is the cheapest way to keep deep positive activations trainable. When it works, the network gets sparse gating and clean gradient flow on the active side.

Where you'll see it

Classic CNNs, MLP baselines, many residual blocks, and any place you want a plain fast hinge before trying fancier modern variants.

Choose it when

Start here when you want the simplest strong baseline and you are not already seeing dead-unit problems.

Watch out for

If a unit spends every batch left of zero, its local slope is exactly 0 and it can stop learning entirely.

Family cheat sheet

ReLU

Hard hinge baseline

hard zero floor. ReLU is the fastest baseline because its geometry is easy: off on the left, perfectly linear on the right.

Leaky ReLU

Rescue slope

negative slope stays alive. Leaky ReLU is the smallest change that rescues the dead-neuron failure mode without giving up the ReLU positive highway.

ELU

Smooth negative state

smooth negative saturation. ELU softens the hinge and keeps negative activations meaningful instead of throwing them away completely.

GELU

Transformer soft gate

probabilistic gate. GELU is the family that made activation gating feel probabilistic instead of binary, which is why it became a transformer-era default.

Softplus

Smooth ReLU family

temperature-controlled hinge. Softplus is the easiest way to see ReLU emerge from a smooth family as the sharpness parameter grows.

TheoryActivation FunctionsReLU, GELU, SwiGLU, gradient flow, and how activation choices change training dynamics CompareSigmoid and Softmax LabReturn to the saturating activations and the class-probability race CompareSwiGLU vs. GELU vs. ReLUSee how hinge families compare to modern gated transformer activations

TheoremPath · 594 topics · Interactive demos