Planning Methods

A comprehensive taxonomy of 20 retrosynthetic planning architectures from 2018 to 2026, organized by planning paradigm. Based on Tables 5 and 6 of the review.

Explicit Graph Search

Routes assembled step-by-step via tree search (MCTS, A*) with single-step expansion models.

Hybrid / Neurosymbolic

Neural policies combined with symbolic search, LLM critics, or ensemble verification.

Direct Sequence Generation

Full retrosynthetic routes generated as a single sequence via autoregressive decoding.

Architecture Taxonomy

Click any row to expand details. Search by name, architecture, or contribution. Sort by column headers.

20 of 20 methods

			Policy	Search	Tier
3N-MCTS	2018	Explicit Graph Search	MLP (Templates)	MCTS (Rollout)	0–1 by construction
DFPN-E	2019	Explicit Graph Search	MLP (Templates)	Proof-Number Search	0–1 by construction
Retro*	2020	Explicit Graph Search	MLP (Templates)	Neural A*	0–1 by construction
AiZynthFinder	2020	Explicit Graph Search	MLP (Templates)	MCTS (UCB)	0–1 by construction
AutoSynRoute	2020	Explicit Graph Search	Transformer	MCTS (Heuristic)	None formal
GRASP	2022	Explicit Graph Search	Actor-Critic (TD3)	MCTS (Goal-driven)	0–1 by construction
RetroGraph	2022	Explicit Graph Search	GNN (Graph-Aware)	AND-OR Graph Search	0–1 by construction
MEEA*	2024	Explicit Graph Search	MLP (Templates)	Hybrid MCTS + A*	0–1 by construction
Llamole	2024	Hybrid / Neurosymbolic	LLM (Heuristic)	A* Search	0–1 (template steps)
DESP	2024	Hybrid / Neurosymbolic	MLP + Distance Net	Bidirectional Search	0–1 by construction
Higher-Level	2025	Explicit Graph Search	MLP (Abstract Templates)	MCTS	0; abstract Tier 1
SynPlanner	2025	Explicit Graph Search	GCN (Templates)	MCTS / A*	0–1 by construction
InterRetro	2025	Explicit Graph Search	Graph2Edits	Greedy (Compiled)	None formal
DirectMultiStep	2025	Direct Sequence Generation	Transformer (Seq2Seq)	Beam Decoding	None formal
SynLlama	2025	Direct Sequence Generation	Llama-3 (LLM)	Greedy Decoding	None formal
RetroChimera	2025	Hybrid / Neurosymbolic	GNN + Transformer	Retro*	0–1 (template branch)
LARC	2025	Hybrid / Neurosymbolic	MEEA* + LLM Critic	MCTS (Agentic)	0–1; partial Tier 2 via LLM
AOT*	2025	Hybrid / Neurosymbolic	LLM (Generative)	AND-OR Tree	None formal (LLM macro-steps)
TempRe	2025	Hybrid / Neurosymbolic	Transformer (Template)	MCTS / Direct	0–1 (constrained mode)
RetroSynFormer	2026	Direct Sequence Generation	Decision Transformer	Beam Search	Tier 0-1

Method Families

A qualitative comparison of the five major method families, from Table 6 of the review.

Expert-Encoded Planners

Explicit Graph Search

Representative Systems

LHASA, Chematica, and other systems built on hand-curated rule sets.

Training Signal

Human expertise encoded as reaction rules, often with explicit steric and electronic guards.

Explicit Search

Yes (typically best-first or proof-number search over a symbolic graph).

Solv-N Level

Can formally guarantee Solv-2 constraints if explicitly encoded in the rules. Otherwise, defaults to Solv-1.

Strength

High interpretability; direct linkage to mechanistic precedent. Crucially, this is the only paradigm where Solv-2 constraints (selectivity) can be formally guaranteed by construction if the rules are sufficiently detailed.

Limitation

Coverage is strictly bounded by the rule library; expensive and slow to update; brittle when faced with novel scaffolds or reaction classes.

Possible Failure Modes

Fails silently when a required transformation is absent from its knowledge base; misapplication of over-generalized rules in novel contexts.

Data-Driven Search Planners

Explicit Graph Search

Representative Systems

3N-MCTS, Retro*, AiZynthFinder, MEEA*. The workhorses of the "Navigability Era".

Training Signal

Supervised single-step policies and/or value functions learned from large reaction corpora (e.g., USPTO).

Explicit Search

Yes (MCTS, A*, or hybrid tree search).

Solv-N Level

Primarily operates at Solv-1 (topological connectivity). Any performance at Solv-2 is implicit and statistical, not guaranteed.

Strength

Systematic exploration of vast search spaces; modular architecture (chemical model can be upgraded independently of the search algorithm). The definitive Solv-1 machines.

Limitation

High inference cost; suffers from horizon effects on deep routes; chemical knowledge is purely statistical and limited to patterns in the training data.

Possible Failure Modes

Finds topologically valid but chemically naive routes; high STR masking low route quality; performance collapses on deep routes; over-reliance on statistically common but strategically poor disconnections.

Hybrid / Neurosymbolic Planners

Hybrid / Neurosymbolic

Representative Systems

Llamole, LARC, RetroChimera, AOT*. The emerging frontier aiming to bridge the validity gap.

Training Signal

Mixed. Combines learned policies with symbolic verifiers, or uses LLMs as heuristic guides, critics, or macro-planners.

Explicit Search

Yes (symbolic search is typically retained as the scaffold for ensuring rigor).

Solv-N Level

Explicitly targets the gap between Solv-1 and Solv-2 by adding layers of validation or heuristic guidance.

Strength

Attempts to combine the rigor of symbolic methods with the coverage of data-driven models. LLMs can inject non-topological constraints like safety or cost.

Limitation

High architectural complexity; performance is sensitive to the calibration between neural and symbolic components; LLMs introduce reproducibility and hallucination challenges.

Possible Failure Modes

Miscalibrated neural heuristics pruning valid symbolic branches; LLM hallucinations leading search astray; inconsistent logic between components causing dead-ends.

Direct Full-Route Generators

Direct Sequence Generation

Representative Systems

DirectMultiStep, SynLlama, RetroSynFormer.

Training Signal

Supervised learning on complete, serialized route trajectories. Learns the joint probability of the entire synthesis plan.

Explicit Search

No (inference is via autoregressive decoding, e.g., beam search).

Solv-N Level

Lacks formal guarantees at any tier. Validity (from Solv-0 upwards) must be externally verified after a route is generated.

Strength

Extremely fast inference; learns global, route-level conditioning (e.g., convergent strategies) that myopic search misses. More robust performance on deep routes.

Limitation

No formal validity guarantees at any tier—requires rigorous post-hoc validation. Boundary conditions (e.g., stock set) are baked into model weights, hindering adaptation.

Possible Failure Modes

Hallucinates chemically impossible steps or invalid SMILES (Solv-0 failure); proposes syntactically valid but chemically incoherent sequences; generates routes that are disconnected from the specified stock set.

The Endgame: A Strategic Outlook

The following is a personal interpretation of where the field is heading.

The diversity of planning methods presents a confusing landscape, but it is likely a transient one. The apparent competition between paradigms obscures a clear trajectory governed by a single, overriding constraint.

1. The Scalability Lesson from Expert Systems

The history of expert-encoded systems demonstrates the fundamental limits of manual curation. Their inability to achieve comprehensive coverage, despite decades of effort, reveals that any paradigm reliant on hand-encoded knowledge will be outpaced by data-driven methods that can scale with automated data acquisition.

2. The Pragmatism of Hybrids and the Bitter Lesson

Hybrid and neurosymbolic architectures are pragmatic, effective solutions for the current landscape, where high-fidelity data is scarce. They leverage symbolic logic to compensate for noisy, incomplete training sets. However, the long-term trend in computationally-driven fields, as described by Sutton's Bitter Lesson, favors general architectures that scale with data and compute over complex, hand-engineered integrations.

3. The Inevitable Convergence

This points toward a unified architecture where a rigorous, physics-aware search process serves as the high-fidelity data generator, and a large-scale sequence model serves as the inference policy. The computationally expensive work of ensuring chemical validity is performed once to create a training corpus, then amortized into a fast, generalizable model. This is the logical endpoint of search-augmented generation.

This reframes the central challenge for the field. The primary obstacle to progress is no longer a deficit in planning algorithms; it is the absence of a scalable, automated Solv-2 validator. Without a reliable oracle to certify chemical plausibility at scale, the data generator cannot run, and policy models are left to train on noisy, low-validity data.

Therefore, developing this validator is the highest-leverage research problem. The team that solves it will not just create a better planner; they will unlock the data required to train the first true chemical foundation models.