Planning Methods
A comprehensive taxonomy of 20 retrosynthetic planning architectures from 2018 to 2026, organized by planning paradigm. Based on Tables 5 and 6 of the review.
Explicit Graph Search
11Routes assembled step-by-step via tree search (MCTS, A*) with single-step expansion models.
Hybrid / Neurosymbolic
6Neural policies combined with symbolic search, LLM critics, or ensemble verification.
Direct Sequence Generation
3Full retrosynthetic routes generated as a single sequence via autoregressive decoding.
Architecture Taxonomy
Click any row to expand details. Search by name, architecture, or contribution. Sort by column headers.
20 of 20 methods
| Policy | Search | Tier | |||
|---|---|---|---|---|---|
3N-MCTS | 2018 | Explicit Graph Search | MLP (Templates) | MCTS (Rollout) | 0–1 by construction |
DFPN-E | 2019 | Explicit Graph Search | MLP (Templates) | Proof-Number Search | 0–1 by construction |
Retro* | 2020 | Explicit Graph Search | MLP (Templates) | Neural A* | 0–1 by construction |
AiZynthFinder | 2020 | Explicit Graph Search | MLP (Templates) | MCTS (UCB) | 0–1 by construction |
AutoSynRoute | 2020 | Explicit Graph Search | Transformer | MCTS (Heuristic) | None formal |
GRASP | 2022 | Explicit Graph Search | Actor-Critic (TD3) | MCTS (Goal-driven) | 0–1 by construction |
RetroGraph | 2022 | Explicit Graph Search | GNN (Graph-Aware) | AND-OR Graph Search | 0–1 by construction |
MEEA* | 2024 | Explicit Graph Search | MLP (Templates) | Hybrid MCTS + A* | 0–1 by construction |
Llamole | 2024 | Hybrid / Neurosymbolic | LLM (Heuristic) | A* Search | 0–1 (template steps) |
DESP | 2024 | Hybrid / Neurosymbolic | MLP + Distance Net | Bidirectional Search | 0–1 by construction |
Higher-Level | 2025 | Explicit Graph Search | MLP (Abstract Templates) | MCTS | 0; abstract Tier 1 |
SynPlanner | 2025 | Explicit Graph Search | GCN (Templates) | MCTS / A* | 0–1 by construction |
InterRetro | 2025 | Explicit Graph Search | Graph2Edits | Greedy (Compiled) | None formal |
DirectMultiStep | 2025 | Direct Sequence Generation | Transformer (Seq2Seq) | Beam Decoding | None formal |
SynLlama | 2025 | Direct Sequence Generation | Llama-3 (LLM) | Greedy Decoding | None formal |
RetroChimera | 2025 | Hybrid / Neurosymbolic | GNN + Transformer | Retro* | 0–1 (template branch) |
LARC | 2025 | Hybrid / Neurosymbolic | MEEA* + LLM Critic | MCTS (Agentic) | 0–1; partial Tier 2 via LLM |
AOT* | 2025 | Hybrid / Neurosymbolic | LLM (Generative) | AND-OR Tree | None formal (LLM macro-steps) |
TempRe | 2025 | Hybrid / Neurosymbolic | Transformer (Template) | MCTS / Direct | 0–1 (constrained mode) |
RetroSynFormer | 2026 | Direct Sequence Generation | Decision Transformer | Beam Search | Tier 0-1 |
Method Families
A qualitative comparison of the five major method families, from Table 6 of the review.
Expert-Encoded Planners
Explicit Graph SearchLHASA, Chematica, and other systems built on hand-curated rule sets.
Human expertise encoded as reaction rules, often with explicit steric and electronic guards.
Yes (typically best-first or proof-number search over a symbolic graph).
Can formally guarantee Solv-2 constraints if explicitly encoded in the rules. Otherwise, defaults to Solv-1.
High interpretability; direct linkage to mechanistic precedent. Crucially, this is the only paradigm where Solv-2 constraints (selectivity) can be formally guaranteed by construction if the rules are sufficiently detailed.
Coverage is strictly bounded by the rule library; expensive and slow to update; brittle when faced with novel scaffolds or reaction classes.
Fails silently when a required transformation is absent from its knowledge base; misapplication of over-generalized rules in novel contexts.
Data-Driven Search Planners
Explicit Graph Search3N-MCTS, Retro*, AiZynthFinder, MEEA*. The workhorses of the "Navigability Era".
Supervised single-step policies and/or value functions learned from large reaction corpora (e.g., USPTO).
Yes (MCTS, A*, or hybrid tree search).
Primarily operates at Solv-1 (topological connectivity). Any performance at Solv-2 is implicit and statistical, not guaranteed.
Systematic exploration of vast search spaces; modular architecture (chemical model can be upgraded independently of the search algorithm). The definitive Solv-1 machines.
High inference cost; suffers from horizon effects on deep routes; chemical knowledge is purely statistical and limited to patterns in the training data.
Finds topologically valid but chemically naive routes; high STR masking low route quality; performance collapses on deep routes; over-reliance on statistically common but strategically poor disconnections.
Hybrid / Neurosymbolic Planners
Hybrid / NeurosymbolicLlamole, LARC, RetroChimera, AOT*. The emerging frontier aiming to bridge the validity gap.
Mixed. Combines learned policies with symbolic verifiers, or uses LLMs as heuristic guides, critics, or macro-planners.
Yes (symbolic search is typically retained as the scaffold for ensuring rigor).
Explicitly targets the gap between Solv-1 and Solv-2 by adding layers of validation or heuristic guidance.
Attempts to combine the rigor of symbolic methods with the coverage of data-driven models. LLMs can inject non-topological constraints like safety or cost.
High architectural complexity; performance is sensitive to the calibration between neural and symbolic components; LLMs introduce reproducibility and hallucination challenges.
Miscalibrated neural heuristics pruning valid symbolic branches; LLM hallucinations leading search astray; inconsistent logic between components causing dead-ends.
Direct Full-Route Generators
Direct Sequence GenerationDirectMultiStep, SynLlama, RetroSynFormer.
Supervised learning on complete, serialized route trajectories. Learns the joint probability of the entire synthesis plan.
No (inference is via autoregressive decoding, e.g., beam search).
Lacks formal guarantees at any tier. Validity (from Solv-0 upwards) must be externally verified after a route is generated.
Extremely fast inference; learns global, route-level conditioning (e.g., convergent strategies) that myopic search misses. More robust performance on deep routes.
No formal validity guarantees at any tier—requires rigorous post-hoc validation. Boundary conditions (e.g., stock set) are baked into model weights, hindering adaptation.
Hallucinates chemically impossible steps or invalid SMILES (Solv-0 failure); proposes syntactically valid but chemically incoherent sequences; generates routes that are disconnected from the specified stock set.
The Endgame: A Strategic Outlook
The following is a personal interpretation of where the field is heading.
The diversity of planning methods presents a confusing landscape, but it is likely a transient one. The apparent competition between paradigms obscures a clear trajectory governed by a single, overriding constraint.
1. The Scalability Lesson from Expert Systems
The history of expert-encoded systems demonstrates the fundamental limits of manual curation. Their inability to achieve comprehensive coverage, despite decades of effort, reveals that any paradigm reliant on hand-encoded knowledge will be outpaced by data-driven methods that can scale with automated data acquisition.
2. The Pragmatism of Hybrids and the Bitter Lesson
Hybrid and neurosymbolic architectures are pragmatic, effective solutions for the current landscape, where high-fidelity data is scarce. They leverage symbolic logic to compensate for noisy, incomplete training sets. However, the long-term trend in computationally-driven fields, as described by Sutton's Bitter Lesson, favors general architectures that scale with data and compute over complex, hand-engineered integrations.
3. The Inevitable Convergence
This points toward a unified architecture where a rigorous, physics-aware search process serves as the high-fidelity data generator, and a large-scale sequence model serves as the inference policy. The computationally expensive work of ensuring chemical validity is performed once to create a training corpus, then amortized into a fast, generalizable model. This is the logical endpoint of search-augmented generation.
This reframes the central challenge for the field. The primary obstacle to progress is no longer a deficit in planning algorithms; it is the absence of a scalable, automated Solv-2 validator. Without a reliable oracle to certify chemical plausibility at scale, the data generator cannot run, and policy models are left to train on noisy, low-validity data.
Therefore, developing this validator is the highest-leverage research problem. The team that solves it will not just create a better planner; they will unlock the data required to train the first true chemical foundation models.