Evaluation

A critical analysis of how current benchmarks inflate reported performance. Based on Section 7 of the review, these interactive tables demonstrate that inventory scope and metric choice are at least as important as algorithmic innovation.

Inventory Controls Difficulty

92% vs 48%

Average Solv-1 for virtual vs. physical inventory tiers. Expanding the stock set inflates success rates without improving the planner.

STR ≠ Chemical Validity

99.7% → 11.9%

Chemformer achieves near-perfect stock-termination but recovers only 11.9% of ground-truth routes. High Solv-1 does not imply correct chemistry.

The Complexity Cliff

81% → 9%

AiZynthFinder's reconstruction accuracy drops from 81% on 2-step routes to 9% on 6-step routes. Performance collapses at depth.

Stock Inflation

Impact of Stock Set Scope on Reported Solvability. Models evaluated against extended virtual libraries (>10⁸ compounds) frequently report near-perfect stock-termination (Solv-1), as the expanded termination criteria shorten the requisite search depth. Evaluations constrained to physically in-stock inventories (~10⁵ compounds) yield substantially lower termination rates.

16 of 16 entries

Virtual Tier (10)

Retro*Virtual
86.84%
Benchmark

USPTO-190

Stock Source

eMolecules

Inventory

~231 M

MEEA*Virtual
100.0%
Benchmark

USPTO-190

Stock Source

eMolecules

Inventory

~231 M

RetroGraphVirtual
99.47%
Benchmark

USPTO-190

Stock Source

eMolecules

Inventory

~231 M

InterRetroVirtual
100.0%
Benchmark

Retro*-190

Stock Source

eMolecules

Inventory

>100 M

Enh-MCTSVirtual
99.20%
Benchmark

ChEMBL (Rand)

Stock Source

ZINC + eMolecules

Inventory

~35 M

PDVNVirtual
99.47%
Benchmark

USPTO-190

Stock Source

eMolecules

Inventory

~23.1 M

DirectMultiStepVirtual
75.58%
Benchmark

ChEMBL-5000

Stock Source

eMolecules

Inventory

~23.1 M

LLM-Syn-PlannerVirtual
92.60%
Benchmark

USPTO-190

Stock Source

eMolecules

Inventory

~23 M

ReSynZVirtual
73.54%
Benchmark

Retro*-190

Stock Source

Sigma + eMol

Inventory

~18 M

Chemformer (Search)Virtual
94.10%
Benchmark

Caspyrus10k

Stock Source

ZINC Full

Inventory

~17.4 M

Physical Tier (6)

DirectMultiStepPhysical
68.66%
Benchmark

ChEMBL-5000

Stock Source

ASKCOS Buyables

Inventory

~330 k

SynLlamaPhysical
19.70%
Benchmark

ChEMBL

Stock Source

Enamine BB

Inventory

~230 k

SynPlanner (MCTS)Physical
56.24%
Benchmark

PaRoutes (n=5)

Stock Source

ASKCOS Buyables

Inventory

~186 k

SynPlanner (MCTS)Physical
18.71%
Benchmark

SAScore >5

Stock Source

ASKCOS Buyables

Inventory

~186 k

µMCT-dcPhysical
76.20%
Benchmark

Reaxys (Rand)

Stock Source

Sigma + eMol

Inventory

~107 k

ReSynZPhysical
50.26%
Benchmark

Retro*-190

Stock Source

Sigma-Aldrich

Inventory

~85 k

a Reported inventory size. The data file in the original repository contains ~23.1M compounds, a number consistent with more recent literature.

b The USPTO-190 and Retro*-190 benchmarks refer to the same set of 190 molecules, pre-filtered for high single-step model performance, introduced in Chen et al. (2020).

Where's the 231M?

Several entries in the stock inflation table above report an eMolecules inventory of 231 million compounds. Where does this number come from? The figure originates in Retro* (ICML 2020) and has since been reprinted across multiple peer-reviewed publications. There's just one small problem with it.

The Citation Chain

Six papers report using the ~231M eMolecules inventory. Of these, 3 publish the actual stock file. The remaining 3 cite one of the others without publishing their own copy.

Retro*

Chen et al. · ICML 2020

~231 M
eMol: 2019-11-01File:
Self-Improved Retro

Kim et al. · ICML 2021

~231 M
eMol: 2019-11-01File:
RetroGraph

Xie et al. · KDD 2022

~231 M
eMol: 2019-11-01File: Cites: Retro*, Self-Improved Retro
GRASP

Yu et al. · NeurIPS 2022

~231 M
eMol: 2019-11-01File:
EG-MCTS

Hong et al. · Commun Chem 2023

~231 M
eMol: 2019-11-01File:
DreamRetroEr

Zhang et al. · Nat Commun 2025

~231 M
eMol: eMolecules (no date)File: Cites: Retro*

All 3 published stock files contain exactly 23,081,629 entries. They share an identical SHA-256 hash:

92b490c76e741212c82e177b22a89687aab0e53c4c78772aed333e540d0e38d8

In Print

The ~231M figure as it appears in each publication.

Retro*. Chen et al. (ICML 2020) — origin of the ~231M figure

Retro*. Chen et al. (ICML 2020) — origin of the ~231M figure

ICML 2020 · Peer-reviewed

Self-Improved Retro. Kim et al. (ICML 2021)

Self-Improved Retro. Kim et al. (ICML 2021)

ICML 2021 · Peer-reviewed

RetroGraph. Xie et al. (KDD 2022)

RetroGraph. Xie et al. (KDD 2022)

KDD 2022 · Peer-reviewed

GRASP. Yu et al. (NeurIPS 2022)

GRASP. Yu et al. (NeurIPS 2022)

NeurIPS 2022 · Peer-reviewed

EG-MCTS. Hong et al. (Commun Chem 2023)

EG-MCTS. Hong et al. (Commun Chem 2023)

Commun Chem 2023 · Peer-reviewed

DreamRetroEr. Zhang et al. (Nat Commun 2025)

DreamRetroEr. Zhang et al. (Nat Commun 2025)

Nat Commun 2025 · Peer-reviewed

1 / 6

Whether an order-of-magnitude discrepancy in the single most consequential experimental variable—one whose value, as shown above, can swing reported success rates by over 50 percentage points—could persist through peer review at ICML, KDD, NeurIPS, and Nature Communications is left as an exercise for the reader.

The Validity Gap and the Complexity Cliff

The disconnect between Tier 1 success (topological stock-termination) and higher-tier criteria related to chemical plausibility (Tiers 2-3). Two complementary views from recent audits.

Panel A: The Inverse Correlation of Solvability, Accuracy, and Speed

Models Matter (Torren-Peraire et al. 2024)

Chemformer achieves near-perfect stock-termination but recovers only 11.9% of ground-truth routes and requires ~8 hours per target. AiZynthFinder shows the opposite trade-off: lower termination but highest route accuracy and fastest inference.

Chemformer

Direct Sequence

Solvability99.70%
Route Acc.11.90%
Search Time~7.8h

LocalRetro

Explicit Graph

Solvability86.0%
Route Acc.36.10%
Search Time~160s

AiZynthFinder

Explicit Graph

Solvability66.30%
Route Acc.61.80%
Search Time~160s