Small Molecules¶
Benchmarks on CH₃F (5 atoms, 8 parameters) optimized against B3LYP/6-31+G(d) QM frequencies. All methods start from identical Seminario-estimated parameters. Results cover speed, accuracy, and cross-engine agreement.
Combined Leaderboard¶
Every backend × optimizer combination, ranked by final accuracy.
| Backend | Optimizer | Final RMSD (cm⁻¹) | Final MAE | Score | Evals | Time | Evals/s |
|---|---|---|---|---|---|---|---|
| JAX | Powell | 0.0 | 0.0 | 0.000 | 2565 | 1.3 s | 1973 |
| JAX-MD | Powell | 0.0 | 0.0 | 0.000 | 2612 | 1.8 s | 1451 |
| JAX | Nelder-Mead | 1037.9 | 888.8 | 0.000 | 1193 | 0.8 s | 1491 |
| JAX-MD | Nelder-Mead | 1037.9 | 888.8 | 0.000 | 1205 | 1.2 s | 1004 |
| OpenMM | Nelder-Mead | — | — | 0.001 | 378 | 2.0 s | 190 |
| OpenMM | Powell | — | — | 0.001 | 722 | 3.8 s | 190 |
| Tinker | Nelder-Mead | — | — | 0.001 | 376 | 286.9 s | 1.3 |
| OpenMM | L-BFGS-B | 114.1 | 93.6 | 0.117 | 424 | 11.7 s | 36 |
| JAX | L-BFGS-B | 813.4 | 610.1 | 0.077 | 406 | 0.6 s | 677 |
| JAX-MD | L-BFGS-B | 813.4 | 610.1 | 0.077 | 370 | 1.0 s | 370 |
All runs start from RMSD = 156.9 cm⁻¹ (score = 0.221).
Reading the table
RMSD = root-mean-square deviation of optimized MM frequencies from QM reference (lower is better). Score = normalized objective function (lower is better; 0.000 = perfect match). Evals/s = energy evaluations per second (higher is better).
Key Observations¶
- Powell and Nelder-Mead reach perfect convergence (RMSD → 0) on JAX and JAX-MD. These derivative-free methods are robust for small parameter spaces.
- L-BFGS-B underperforms with finite-difference gradients — it converges
to a suboptimal point on all backends. Connecting
energy_and_param_grad()(analytical gradients viajax.grad) would likely fix this. - JAX backends are 5–10× faster than OpenMM per evaluation. JIT-compiled pure JAX eliminates Python ↔ C++ marshalling overhead.
- JAX-MD is ~30% slower than JAX due to neighbor list management and periodic boundary bookkeeping, but both are far faster than OpenMM/Tinker.
Cross-Engine Parity¶
Do different engines agree on the same answer? These comparisons use the CH₃F molecule at equilibrium geometry with identical Seminario-estimated force field parameters.
Energy Agreement¶
| Comparison | Energy Difference |
|---|---|
| JAX vs JAX-MD | 3 × 10⁻²⁰ kcal/mol |
| JAX vs OpenMM | 3 × 10⁻¹⁸ kcal/mol |
| JAX-MD vs OpenMM | 3 × 10⁻¹⁸ kcal/mol |
All three engines agree to machine precision.
Frequency Agreement¶
| Mode | OpenMM (cm⁻¹) | JAX (cm⁻¹) | JAX-MD (cm⁻¹) | Max Δ |
|---|---|---|---|---|
| 1 | 104.8102 | 104.8102 | 104.8102 | 4 × 10⁻⁵ |
| 2 | 104.8102 | 104.8106 | 104.8106 | 4 × 10⁻⁴ |
| 3 | 110.0376 | 110.0373 | 110.0373 | 3 × 10⁻⁴ |
| 4 | 162.9583 | 162.9583 | 162.9583 | 4 × 10⁻⁵ |
| 5 | 165.5864 | 165.5867 | 165.5867 | 3 × 10⁻⁴ |
| 6 | 165.5866 | 165.5868 | 165.5868 | 2 × 10⁻⁴ |
| 7 | 346.9681 | 346.9676 | 346.9676 | 5 × 10⁻⁴ |
| 8 | 361.5531 | 361.5529 | 361.5529 | 2 × 10⁻⁴ |
| 9 | 361.5539 | 361.5530 | 361.5530 | 9 × 10⁻⁴ |
- JAX vs JAX-MD agree to < 10⁻¹² cm⁻¹ (machine precision).
- JAX/JAX-MD vs OpenMM agree to < 0.001 cm⁻¹. The tiny differences
arise from different Hessian methods (analytical
jax.hessianvs OpenMM's finite-difference Hessian).
Why exact parity matters
If two engines produce different energies for the same force field, you cannot trust that one engine's implementation is correct. Machine-precision agreement validates that JAX, JAX-MD, and OpenMM all compute the same math for the same functional form. Note: this parity only holds when engines share the same functional form and non-bonded treatment (combining rules, 1-4 scaling, cutoffs). Engines with different force field equations or different non-bonded parameters will naturally produce different results.
Frequency Accuracy After Optimization¶
How well do the optimized MM frequencies match the QM reference? This is the primary accuracy metric — the whole point of Q2MM.
Best Result: JAX + Powell (Score = 0.000)¶
Powell on both JAX backends converges to a perfect score (0.000), meaning all optimized MM frequencies exactly match the QM reference. This is expected for a fully determined system (8 free parameters, 9 frequency targets).
Starting from Seminario estimates (RMSD = 156.9 cm⁻¹), the optimizer corrects all force constants to reproduce B3LYP/6-31+G(d) harmonic frequencies within floating-point precision.
Worst Result: L-BFGS-B with Finite Differences (Score = 0.077)¶
L-BFGS-B converges to a suboptimal local minimum on all backends. With
finite-difference gradients (eps=1e-3), it cannot navigate the shallow
objective landscape — particularly for coupled bending/stretching modes.
Connecting energy_and_param_grad() (analytical gradients) would likely
fix this.
Seminario Method¶
Extracting bond/angle force constants from a QM Hessian matrix.
| Molecule | Atoms | Time |
|---|---|---|
| Water | 3 | 0.4 ms |
| CH₃F | 5 | 1.2 ms |
The Seminario method is pure NumPy linear algebra (eigenvalue decomposition of 3×3 Hessian sub-blocks). It is effectively instant compared to everything else in the pipeline. It provides a good starting point (RMSD 156.9 cm⁻¹ vs default 1870.1 cm⁻¹) but further optimization is needed for high accuracy.
QM Calculations (Psi4)¶
These are run once per molecule to generate reference data, not during the optimization loop.
| Calculation | Level | Molecule | Time |
|---|---|---|---|
| Energy | B3LYP/6-31G* | Water (3 atoms) | 1.1 s |
| Hessian | B3LYP/6-31G* | Water (3 atoms) | 7.8 s |
Scaling notes:
- QM cost scales as O(N³)–O(N⁴) with basis functions. A 30-atom organic molecule with 6-31G* takes ~5–30 minutes for a Hessian.
- Transition state Hessians (for TSFF work) take the same time but contain one negative eigenvalue along the reaction coordinate.
- Psi4 parallelizes well — set
psi4.set_num_threads(N)for multi-core speedup.
Bottleneck Analysis¶
For a typical small-molecule optimization workflow:
QM Hessian (one-time) ████████████████ 7.8 s
Seminario (one-time) ▏ 0.001 s
Optimization loop ████████████████████████████████████ 0.8–2 s (JAX)
└─ per evaluation ▏ 0.0005 s (JAX energy call)
The energy evaluation is the bottleneck. Strategies to speed up:
- Use JAX or JAX-MD — ~1000–2000 eval/s, 5–10× faster than OpenMM
- Use OpenMM over Tinker — ~190 eval/s vs ~1.3 eval/s
- Reduce evaluations — Nelder-Mead converges in ~400 evaluations
- Fewer molecules/geometries — each adds one energy call per evaluation
- Analytical gradients — JAX and JAX-MD support
energy_and_param_grad()viajax.grad, eliminating the 2N+1 finite-difference overhead
Benchmarks generated by q2mm-benchmark CLI — all methods
start from identical perturbed parameters (Seminario estimates).
Run q2mm-benchmark --list to see available backends and optimizers.