Benchmarks¶

Performance and validation benchmarks across molecules, QM reference sources, and MM backends. All times are wall-clock on an AMD/Intel desktop with 32 GB RAM, Python 3.12.

Head-to-Head Summary¶

System	QM Source	Params	Freq Refs	Seminario	Optimizer	Score Δ	Time
CH₃F	Gaussian B3LYP/6-31+G(d)	8	9	0.001 s	L-BFGS-B	0.221 → 0.018 (92%)	4.2 s
CH₃F	Gaussian B3LYP/6-31+G(d)	8	9	0.001 s	Nelder-Mead	0.221 → 0.001 (99%)	2.0 s
CH₃F	Gaussian B3LYP/6-31+G(d)	8	9	0.001 s	Powell	0.221 → 0.001 (99%)	3.8 s
Rh-enamide (9 mol)	Jaguar B3LYP/LACVP**	182	1,030	0.03 s	Nelder-Mead	434,172 → 101,077 (76.7%)	369 s
Rh-enamide (9 mol)	Psi4 B3LYP/def2-SVP	182	—	0.03 s	Nelder-Mead	—	—

Rh-enamide Psi4 row will be populated after Psi4 generation completes.

Key Takeaways¶

JAX backends are fastest — JAX (~1500 eval/s) and JAX-MD (~1000 eval/s) are 5–10× faster than OpenMM and 500–1000× faster than Tinker. Both JIT-compile energy functions as pure JAX, eliminating Python ↔ C++ marshalling overhead.
All engines agree to machine precision — JAX, JAX-MD, and OpenMM produce identical energies (< 10⁻¹⁸ kcal/mol) and frequencies (< 0.001 cm⁻¹) for the same force field and functional form. This validates implementation correctness across backends. Note: parity only holds when engines share the same functional form and non-bonded treatment (combining rules, 1-4 scaling, cutoffs).
Nelder-Mead and Powell converge to perfect scores (0.000) on small molecules. L-BFGS-B with finite-difference gradients gets stuck at suboptimal points — analytical gradients via jax.grad would fix this.
JAX and JAX-MD support analytical parameter gradients via jax.grad, which will eliminate the 2N+1 finite-difference overhead once the optimizer is wired to use energy_and_param_grad().
L-BFGS-B diverges on high-dimensional frequency objectives. With 182 parameters and 9 molecules, finite-difference gradients are unstable — the optimizer worsened the score. Nelder-Mead is the reliable choice for derivative-free frequency optimization on complex systems.
The Seminario method is effectively free — even 182-parameter organometallic systems complete in < 50 ms.
Scaling: With Nelder-Mead and 9 molecules (1,030 frequency references), the full optimization converges in ~370 s (~6 min), achieving 76.7% score improvement. Each evaluation computes frequencies for all training molecules.

Detailed Results¶

Small Molecules — CH₃F: combined speed + accuracy leaderboard, cross-engine parity, frequency accuracy analysis
Rh-Enamide (Jaguar) — 9-structure organometallic training set with Jaguar B3LYP/LACVP** reference data
Rh-Enamide (Psi4) — Same system with Psi4 B3LYP/def2-SVP reference data

Benchmarks generated by q2mm-benchmark CLI. Run q2mm-benchmark --list to see available backends and optimizers. Last updated: March 2026.