Benchmarks¶
Performance and validation benchmarks across molecules, QM reference sources, and MM backends. All times are wall-clock on an AMD/Intel desktop with 32 GB RAM, Python 3.12.
Head-to-Head Summary¶
| System | QM Source | Params | Freq Refs | Seminario | Optimizer | Score Δ | Time |
|---|---|---|---|---|---|---|---|
| CH₃F | Gaussian B3LYP/6-31+G(d) | 8 | 9 | 0.001 s | L-BFGS-B | 0.221 → 0.018 (92%) | 4.2 s |
| CH₃F | Gaussian B3LYP/6-31+G(d) | 8 | 9 | 0.001 s | Nelder-Mead | 0.221 → 0.001 (99%) | 2.0 s |
| CH₃F | Gaussian B3LYP/6-31+G(d) | 8 | 9 | 0.001 s | Powell | 0.221 → 0.001 (99%) | 3.8 s |
| Rh-enamide (9 mol) | Jaguar B3LYP/LACVP** | 182 | 1,030 | 0.03 s | Nelder-Mead | 434,172 → 101,077 (76.7%) | 369 s |
| Rh-enamide (9 mol) | Psi4 B3LYP/def2-SVP | 182 | — | 0.03 s | Nelder-Mead | — | — |
Rh-enamide Psi4 row will be populated after Psi4 generation completes.
Key Takeaways¶
-
JAX backends are fastest — JAX (~1500 eval/s) and JAX-MD (~1000 eval/s) are 5–10× faster than OpenMM and 500–1000× faster than Tinker. Both JIT-compile energy functions as pure JAX, eliminating Python ↔ C++ marshalling overhead.
-
All engines agree to machine precision — JAX, JAX-MD, and OpenMM produce identical energies (< 10⁻¹⁸ kcal/mol) and frequencies (< 0.001 cm⁻¹) for the same force field and functional form. This validates implementation correctness across backends. Note: parity only holds when engines share the same functional form and non-bonded treatment (combining rules, 1-4 scaling, cutoffs).
-
Nelder-Mead and Powell converge to perfect scores (0.000) on small molecules. L-BFGS-B with finite-difference gradients gets stuck at suboptimal points — analytical gradients via
jax.gradwould fix this. -
JAX and JAX-MD support analytical parameter gradients via
jax.grad, which will eliminate the 2N+1 finite-difference overhead once the optimizer is wired to useenergy_and_param_grad(). -
L-BFGS-B diverges on high-dimensional frequency objectives. With 182 parameters and 9 molecules, finite-difference gradients are unstable — the optimizer worsened the score. Nelder-Mead is the reliable choice for derivative-free frequency optimization on complex systems.
-
The Seminario method is effectively free — even 182-parameter organometallic systems complete in < 50 ms.
-
Scaling: With Nelder-Mead and 9 molecules (1,030 frequency references), the full optimization converges in ~370 s (~6 min), achieving 76.7% score improvement. Each evaluation computes frequencies for all training molecules.
Detailed Results¶
- Small Molecules — CH₃F: combined speed + accuracy leaderboard, cross-engine parity, frequency accuracy analysis
- Rh-Enamide (Jaguar) — 9-structure organometallic training set with Jaguar B3LYP/LACVP** reference data
- Rh-Enamide (Psi4) — Same system with Psi4 B3LYP/def2-SVP reference data
Benchmarks generated by q2mm-benchmark CLI. Run q2mm-benchmark --list
to see available backends and optimizers. Last updated: March 2026.