Skip to main content

Prerequisite chain

Prerequisites for DPO vs GRPO vs RL for Reasoning

Topics you need before working through DPO vs GRPO vs RL for Reasoning. Direct prerequisites are listed first; transitive prerequisites (the chain reachable through them) follow.

Direct prerequisites (2)

  1. RLHF and Alignmentlayer 4, tier 2
  2. Policy Gradient Theoremlayer 3, tier 1

Reachable through the chain (14)

These topics are not directly cited as prerequisites but are reached transitively by following the chain upward. Working through the direct prerequisites pulls these in.