Skip to main content

Applied ML

Reinforcement Learning for Synthesis Planning

Retrosynthesis as tree search: AlphaChem-style MCTS over learned reaction templates, transformer-based template-free models, and reward shaping with synthetic accessibility heuristics.

AdvancedTier 3Current~15 min
0

Why This Matters

Retrosynthesis is the inverse problem of organic synthesis: given a target molecule, propose a sequence of reactions that build it from purchasable starting materials. The search tree branches on every disconnection (thousands of templates apply to a typical target) and goes 5-15 steps deep before reaching commercial precursors. Exhaustive enumeration is infeasible. Expert chemists prune by chemical intuition; that intuition is exactly what learned policies and value functions can encode.

Synthesis planning sits between real bottlenecks. A 20-step drug candidate is effectively unsynthesizable even with excellent predicted activity. In process chemistry, route cost is measured in weeks of bench work and kilograms of waste solvent. A planner that returns a 6-step route instead of an 11-step one is a direct win, not a benchmark game.

Core Ideas

The classic formulation treats retrosynthesis as a Markov decision process: the state is the current set of unresolved target molecules, the action is a (template, target) pair, the transition applies the template's disconnection, and the episode terminates when every leaf is in a stock catalog. The reward is sparse: 1 for a complete route, 0 otherwise, often shaped by route length and step plausibility.

Segler, Preuss, and Waller (2018, Nature 555) introduced this MCTS formulation, which the community calls AlphaChem by analogy. Three networks do the work: an expansion policy pϕ(templatemolecule)p_\phi(\text{template} \mid \text{molecule}) trained on millions of literature reactions, a fast rollout policy used inside playouts, and an in-scope filter that predicts whether a (template, target) pair will react. On a held-out test set the planner solved 80% of targets within 5 seconds versus 4% for a best-first heuristic baseline.

Template-free models replace the symbolic template library with a sequence-to-sequence transformer that maps product SMILES to reactant SMILES directly (Schwaller et al. 2019, ACS Cent. Sci. 5, "Molecular Transformer"; Schwaller et al. 2020, Chem. Sci. 11, "AutoSynRoute"). This removes the template-extraction bottleneck and handles long-tail chemistry that templates miss, at the cost of occasional invalid SMILES outputs and reduced interpretability of the proposed disconnection.

Reward shaping is unavoidable. A binary "reached stock" signal is too sparse for policy gradient methods, so practitioners blend in SAscore (Ertl-Schuffenhauer 2009) for synthetic accessibility, route length, and in-scope confidence. The shaped reward changes the optimal policy: SAscore was calibrated against a 2009 chemist survey and has known biases against fluorine-rich and macrocyclic chemistry.

Common Confusions

Watch Out

Template policies are not retrieving from a fixed library at inference

Templates are extracted once from a reaction corpus (USPTO, Reaxys), but the policy network is a learned conditional distribution over templates given a target. New target molecules trigger generalization across templates, not a database lookup. A template that never co-occurred with the exact target functional group can still be ranked highly if the policy learned the underlying disconnection logic.

Watch Out

A high-confidence route is not a working route

Reported solve rates are computed against in-stock catalogs and learned in-scope filters, not against wet-lab outcomes. A route with all steps at 85% predicted confidence has roughly 0.85827%0.85^8 \approx 27\% expected end-to-end success even under the model's own assumptions, and real conditions (solvent, scale, impurities) further degrade the number.

References

Segler 2018 AlphaChem

Segler, Preuss, Waller, "Planning chemical syntheses with deep neural networks and symbolic AI," Nature 555, 2018, pp. 604-610.

Schwaller 2019 Mol. Transformer

Schwaller et al., "Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction," ACS Cent. Sci. 5(9), 2019, pp. 1572-1583.

Schwaller 2020 AutoSynRoute

Schwaller et al., "Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy," Chem. Sci. 11, 2020, pp. 3316-3325.

Coley 2019 ASKCOS

Coley et al., "A robotic platform for flow synthesis of organic compounds informed by AI planning," Science 365(6453), 2019, eaax1566.

Ertl 2009 SAscore

Ertl, Schuffenhauer, "Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions," J. Cheminform. 1:8, 2009.

Genheden 2020 AiZynthFinder

Genheden et al., "AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning," J. Cheminform. 12:70, 2020.

Related Topics

Last reviewed: April 18, 2026

Prerequisites

Foundations this topic depends on.

Next Topics