Applied ML
Reinforcement Learning for Synthesis Planning
Retrosynthesis as tree search: AlphaChem-style MCTS over learned reaction templates, transformer-based template-free models, and reward shaping with synthetic accessibility heuristics.
Prerequisites
Why This Matters
Retrosynthesis is the inverse problem of organic synthesis: given a target molecule, propose a sequence of reactions that build it from purchasable starting materials. The search tree branches on every disconnection (thousands of templates apply to a typical target) and goes 5-15 steps deep before reaching commercial precursors. Exhaustive enumeration is infeasible. Expert chemists prune by chemical intuition; that intuition is exactly what learned policies and value functions can encode.
Synthesis planning sits between real bottlenecks. A 20-step drug candidate is effectively unsynthesizable even with excellent predicted activity. In process chemistry, route cost is measured in weeks of bench work and kilograms of waste solvent. A planner that returns a 6-step route instead of an 11-step one is a direct win, not a benchmark game.
Core Ideas
The classic formulation treats retrosynthesis as a Markov decision process: the state is the current set of unresolved target molecules, the action is a (template, target) pair, the transition applies the template's disconnection, and the episode terminates when every leaf is in a stock catalog. The reward is sparse: 1 for a complete route, 0 otherwise, often shaped by route length and step plausibility.
Segler, Preuss, and Waller (2018, Nature 555) introduced this MCTS formulation, which the community calls AlphaChem by analogy. Three networks do the work: an expansion policy trained on millions of literature reactions, a fast rollout policy used inside playouts, and an in-scope filter that predicts whether a (template, target) pair will react. On a held-out test set the planner solved 80% of targets within 5 seconds versus 4% for a best-first heuristic baseline.
Template-free models replace the symbolic template library with a sequence-to-sequence transformer that maps product SMILES to reactant SMILES directly (Schwaller et al. 2019, ACS Cent. Sci. 5, "Molecular Transformer"; Schwaller et al. 2020, Chem. Sci. 11, "AutoSynRoute"). This removes the template-extraction bottleneck and handles long-tail chemistry that templates miss, at the cost of occasional invalid SMILES outputs and reduced interpretability of the proposed disconnection.
Reward shaping is unavoidable. A binary "reached stock" signal is too sparse for policy gradient methods, so practitioners blend in SAscore (Ertl-Schuffenhauer 2009) for synthetic accessibility, route length, and in-scope confidence. The shaped reward changes the optimal policy: SAscore was calibrated against a 2009 chemist survey and has known biases against fluorine-rich and macrocyclic chemistry.
Common Confusions
Template policies are not retrieving from a fixed library at inference
Templates are extracted once from a reaction corpus (USPTO, Reaxys), but the policy network is a learned conditional distribution over templates given a target. New target molecules trigger generalization across templates, not a database lookup. A template that never co-occurred with the exact target functional group can still be ranked highly if the policy learned the underlying disconnection logic.
A high-confidence route is not a working route
Reported solve rates are computed against in-stock catalogs and learned in-scope filters, not against wet-lab outcomes. A route with all steps at 85% predicted confidence has roughly expected end-to-end success even under the model's own assumptions, and real conditions (solvent, scale, impurities) further degrade the number.
References
Segler 2018 AlphaChem
Segler, Preuss, Waller, "Planning chemical syntheses with deep neural networks and symbolic AI," Nature 555, 2018, pp. 604-610.
Schwaller 2019 Mol. Transformer
Schwaller et al., "Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction," ACS Cent. Sci. 5(9), 2019, pp. 1572-1583.
Schwaller 2020 AutoSynRoute
Schwaller et al., "Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy," Chem. Sci. 11, 2020, pp. 3316-3325.
Coley 2019 ASKCOS
Coley et al., "A robotic platform for flow synthesis of organic compounds informed by AI planning," Science 365(6453), 2019, eaax1566.
Ertl 2009 SAscore
Ertl, Schuffenhauer, "Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions," J. Cheminform. 1:8, 2009.
Genheden 2020 AiZynthFinder
Genheden et al., "AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning," J. Cheminform. 12:70, 2020.
Related Topics
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Markov Decision ProcessesLayer 2
- Convex Optimization BasicsLayer 1
- Differentiation in RnLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Matrix Operations and PropertiesLayer 0A
- Concentration InequalitiesLayer 1
- Common Probability DistributionsLayer 0A
- Expectation, Variance, Covariance, and MomentsLayer 0A
- Policy Gradient TheoremLayer 3