Benchmarks¶
Reading guide
If you're new to q2mm, start with the Tutorial — this section is evidence and analysis, not a learning path. For per-system results and literature context, see the Systems section.
q2mm's benchmark program answers three questions:
- Does the engine work? Can q2mm load published force fields and reproduce their fit quality? → Published FF Validation
- Does the optimizer work? Given QM reference data and a starting FF, can q2mm reduce the penalty function? → Optimizer Comparison, Small Molecules
- Is it fast enough? When does GPU or the analytical-gradient path pay for itself? → GPU Acceleration, When Analytical Wins
Benchmark sections¶
| Section | What it covers | Key result |
|---|---|---|
| Published FF Validation | Load published FFs and document the MacroModel MM3* transfer boundary | Rh-enamide Check 1 reaches R² = 0.60 under q2mm; beating QFUERZA is out of scope without MacroModel parity |
| QFUERZA Validation | Verify our QFUERZA implementation against the paper and Zenodo data; evaluate starting-point quality across 5 systems | Cisplatin matches Zenodo exactly for QFUERZA angles; all five TS systems have positive mass-weighted starting-point R² |
| Optimizer Comparison | Compare the production SciPy L-BFGS-B + JaxLoss path on literature-scale TS systems | 4 of 5 systems show significant q2mm-objective improvement; pd-allyl is a confirmed local minimum |
| GPU Acceleration | CPU-vs-GPU wall-clock comparisons | JAX GPU ~5.6× faster on Rh-enamide |
| When Analytical Wins | System-size crossover for analytical vs finite-difference gradients | Analytical gradients dominate above ~50 parameters |
Systems¶
| System | Molecules | Parameters | Status |
|---|---|---|---|
| Small Molecules | 1 (CH₃F) | 6–12 | 82-combination matrix complete |
| Rh-enamide | 9 | 182 | Validated against Donoghue JCTC 2008 |
| Heck relay | 23 | ~180 | QFUERZA evaluated; cross-engine gap |
| Pd-allyl | 21 | ~200 | QFUERZA evaluated; cross-engine gap |
| Pd 1,4-conjugate | 10 | ~200 | QFUERZA evaluated; cross-engine gap |
| Rh 1,4-conjugate | 10 | ~200 | QFUERZA evaluated; cross-engine gap |
What is not covered yet¶
- Type-normalized penalties — the upstream Q2MM divides each data
type by its count (
score / N_type). Our un-normalized eigenmatrix loss creates a flat landscape that limits L-BFGS-B convergence. - Selectivity prediction — q2mm does not yet have the conformational-search infrastructure needed for end-to-end enantioselectivity benchmarks.
Metrics used in this section
| Metric | Meaning |
|---|---|
| R² | Coefficient of determination on eigenvalue scatter — 1.0 = perfect, 0 = no correlation, negative = worse than the mean. |
| RMSD | Root-mean-square deviation between QM and MM frequencies (cm⁻¹). Lower is better. |
| TS | Transition state — the highest-energy geometry along a reaction pathway. |
| FF | Force field — the set of parameters defining how atoms interact. |
| TSFF | Transition state force field — parameterized specifically for TS geometries. |
Artifacts¶
Published-force-field re-evaluations are archived in
test/fixtures/published_ff/. Historical benchmark data (from earlier
frequency-only runs) is in the
q2mm-data repo.