Skip to content

Benchmarks

Performance and validation benchmarks across molecules, QM reference sources, and MM backends. All times are wall-clock on an AMD/Intel desktop with 32 GB RAM, Python 3.12.


Head-to-Head Summary

System QM Source Params Freq Refs Seminario Optimizer Score Δ Time
CH₃F Gaussian B3LYP/6-31+G(d) 8 9 0.001 s L-BFGS-B 0.221 → 0.018 (92%) 4.2 s
CH₃F Gaussian B3LYP/6-31+G(d) 8 9 0.001 s Nelder-Mead 0.221 → 0.001 (99%) 2.0 s
CH₃F Gaussian B3LYP/6-31+G(d) 8 9 0.001 s Powell 0.221 → 0.001 (99%) 3.8 s
Rh-enamide (9 mol) Jaguar B3LYP/LACVP** 182 1,030 0.03 s Nelder-Mead 434,172 → 101,077 (76.7%) 369 s
Rh-enamide (9 mol) Psi4 B3LYP/def2-SVP 182 0.03 s Nelder-Mead

Rh-enamide Psi4 row will be populated after Psi4 generation completes.


Key Takeaways

  1. JAX backends are fastest — JAX (~1500 eval/s) and JAX-MD (~1000 eval/s) are 5–10× faster than OpenMM and 500–1000× faster than Tinker. Both JIT-compile energy functions as pure JAX, eliminating Python ↔ C++ marshalling overhead.

  2. All engines agree to machine precision — JAX, JAX-MD, and OpenMM produce identical energies (< 10⁻¹⁸ kcal/mol) and frequencies (< 0.001 cm⁻¹) for the same force field and functional form. This validates implementation correctness across backends. Note: parity only holds when engines share the same functional form and non-bonded treatment (combining rules, 1-4 scaling, cutoffs).

  3. Nelder-Mead and Powell converge to perfect scores (0.000) on small molecules. L-BFGS-B with finite-difference gradients gets stuck at suboptimal points — analytical gradients via jax.grad would fix this.

  4. JAX and JAX-MD support analytical parameter gradients via jax.grad, which will eliminate the 2N+1 finite-difference overhead once the optimizer is wired to use energy_and_param_grad().

  5. L-BFGS-B diverges on high-dimensional frequency objectives. With 182 parameters and 9 molecules, finite-difference gradients are unstable — the optimizer worsened the score. Nelder-Mead is the reliable choice for derivative-free frequency optimization on complex systems.

  6. The Seminario method is effectively free — even 182-parameter organometallic systems complete in < 50 ms.

  7. Scaling: With Nelder-Mead and 9 molecules (1,030 frequency references), the full optimization converges in ~370 s (~6 min), achieving 76.7% score improvement. Each evaluation computes frequencies for all training molecules.


Detailed Results

  • Small Molecules — CH₃F: combined speed + accuracy leaderboard, cross-engine parity, frequency accuracy analysis
  • Rh-Enamide (Jaguar) — 9-structure organometallic training set with Jaguar B3LYP/LACVP** reference data
  • Rh-Enamide (Psi4) — Same system with Psi4 B3LYP/def2-SVP reference data

Benchmarks generated by q2mm-benchmark CLI. Run q2mm-benchmark --list to see available backends and optimizers. Last updated: March 2026.