ChemRxiv 2026 · Preprint

The Syntax of Matter

Synthesis Planning as the Foundation of Generative Chemistry

Anton Morgunov*, Yu Shee, Alexander V. Soudackov, Victor S. Batista*·Department of Chemistry, Yale University

The Argument

We have been training models to predict what molecules do before teaching them how molecules are made. This review argues that synthesis planning — the step-by-step logic of constructing a molecule — is the chemical equivalent of next-token prediction: the foundational objective that will unlock genuine chemical reasoning. We survey the field from 2020 to 2026, diagnose why current benchmarks are misleading, and propose a new validity framework (Solv-N) to measure what actually matters.

The Paradigm Shift Toward Artificial Chemical Intelligence. Analogous to the evolution of NLP and structural biology, chemical AI must transition from static graph representations to causal reaction trajectories evaluated via the Solv-N hierarchy.
The Paradigm Shift Toward Artificial Chemical Intelligence. Analogous to the evolution of NLP and structural biology — where optimizing for structural grammar yielded emergent reasoning — chemical AI must transition from static graph representations to causal reaction trajectories, rigorously evaluated via the Solv-N hierarchy.

Explore the Review

Core arguments, frameworks, and open problems — distilled into web-native, directly-linkable references.

Reading Paths

Not sure where to start? Pick a guided path tailored to your background.

Quick Overview

15 min·8 steps

The core argument in 15 minutes. For busy researchers who want the key takeaways without reading the full paper.

Start

Chemist's Path

35 min·15 steps

For synthetic and computational chemists. Focuses on the domain arguments, method landscape, and practical implications for bench chemistry.

Start

ML Researcher Path

35 min·15 steps

For ML/AI researchers. Focuses on the computational arguments, benchmark critique, and algorithmic open problems.

Start

Complete Deep Dive

90 min·16 steps

The full paper in recommended reading order. For readers who want comprehensive understanding of every argument and its evidence.

Start

Central Claims

1.

Structure precedes quantity.

We are attempting to solve chemistry's quantitative problems (toxicity, binding affinity) before we have mastered its structural grammar. The history of AI shows this order is backwards.

2.

Navigability is solved. Validity is not.

Modern planners achieve 99%+ stock-termination rates. But stock-termination measures graph connectivity, not chemical correctness. The field is celebrating victory on the wrong metric.

3.

Synthesis planning is the chemical analogue of next-token prediction.

Just as language models acquired generalizable reasoning by mastering the grammar of text, chemical models may acquire robust physical reasoning by training on the causal logic of molecular transformation.

4.

Evaluation requires a validity hierarchy.

The proposed Solv-N framework separates syntactic correctness (Solv-0) and topological connectivity (Solv-1) from selectivity (Solv-2) and executability (Solv-3). Most published results only measure Solv-1.


Cite this work

Anton Morgunov, Yu Shee, Alexander V. Soudackov, Victor S. Batista. “The Syntax of Matter: Synthesis Planning as the Foundation of Generative Chemistry.” ChemRxiv (2026). Preprint.