Published Force Field Validation¶
What this is¶
Before trusting q2mm's optimizers to produce new force fields, we need to prove it can correctly evaluate force fields that have already been published and validated against experiment. This page documents that proof.
The validation program has two checks, run in order:
| Check | Question | Status |
|---|---|---|
| Check 1 | Can q2mm load a published force field and reproduce its fit quality against the original QM data? | ✅ R² = 0.60 (JAX engine + near-linear torsion damping + RC exclusion). Published FF < Seminario baseline due to cross-engine gap. |
| Check 2 | Can q2mm re-derive the published force field from scratch using its own optimizers? | ✅ JaxOpt L-BFGS on Rh-enamide: RMSD 260 → 153 cm⁻¹ (50 iters, GPU) |
Check 1 must pass before Check 2 makes sense — if we can't even evaluate a known-good force field correctly, there's no point trying to re-derive it.
Check 1: published force field evaluation¶
System: Rh-enamide hydrogenation (Donoghue et al. J. Chem. Theory Comput. 2008, 4, 1313–1323; DOI)
What we did: loaded the published MM3* force field (originally optimized with Q2MM + MacroModel) and evaluated it with q2mm's JAX MM3 engine against the same 9 transition-state structures and QM frequencies.
Root cause found and fixed: all 9 TS structures have near-linear C-C-N angles (~179°). The dihedral gradient diverges as 1/sin²(θ) at 180°, amplifying even reasonable torsion force constants into 10,000+ kcal/(mol·Å) forces. Production MM3 codes (TINKER, MacroModel) universally handle this by skipping/damping near-linear torsions. We added a Hermite smoothstep that smoothly suppresses torsion terms when either central angle exceeds 170°.
Additionally, the reaction-coordinate MM frequency (>4000 cm⁻¹ from the
deliberately stiffened TS bond) was excluded from comparison, matching the
Q2MM convention of weight 0.00 for the first eigenvalue (eig_i).
Results after fix (JAX engine, near-linear damping, RC exclusion):
| Molecule | Atoms | Freq refs | RMSD (cm⁻¹) | MAE (cm⁻¹) | R² |
|---|---|---|---|---|---|
| TS 1 | 36 | 88 | 583 | 306 | 0.428 |
| TS 2 | 38 | 95 | 527 | 277 | 0.597 |
| TS 3 | 38 | 95 | 526 | 276 | 0.598 |
| TS 4 | 62 | 157 | 566 | 312 | 0.438 |
| TS 5 | 62 | 156 | 603 | 347 | 0.311 |
| TS 6 | 58 | 153 | 422 | 201 | 0.767 |
| TS 7 | 58 | 153 | 401 | 182 | 0.794 |
| TS 8 | 58 | 151 | 445 | 217 | 0.732 |
| TS 9 | 58 | 151 | 445 | 217 | 0.733 |
| Average | 1199 | 502 | 262 | 0.600 |
Status of Check 1 promotion gates:
| Gate | Issue | Status |
|---|---|---|
| All per-molecule R² > 0 | Eigenvalue R² positive for every molecule | ✅ Passes |
| Average R² > 0.40 | Mean eigenvalue R² exceeds 0.40 threshold | ✅ Passes |
| Published FF beats Seminario | Fitted FF outperforms QFUERZA baseline | ⚠️ xfail — engine gap |
The published FF does not beat the Seminario (QFUERZA) baseline because it was optimized for MacroModel's MM3* engine, which includes functional-form features (metal-center torsion rules, possibly stretch-bend cross terms) that our engine doesn't replicate. The Seminario method projects QM Hessian eigenvalues directly (engine-independent), so it naturally outperforms a cross-engine evaluation. This is documented as an inherent limitation, not a bug.
Check 2: force field re-derivation¶
Frequency-only methodology
This check used a frequency-only objective with the old full-FF parameter scope (~2,742 params). The paper's actual methodology uses a multi-target penalty (eigenmatrix + geometry) with only ~182 OPT-substructure params. Results below are valid as a proof-of-concept for the optimization pipeline but do not represent publication reproduction.
JaxOpt L-BFGS (analytical gradients, 50 iterations) on the Rh-enamide 9-molecule training set (frequency-only objective):
- RMSD: 260 → 153 cm⁻¹
- Score: 91.5 → 77.0
- Time: 341 s (GPU, RTX 5090)
- Objective: frequency-only (not the paper's multi-target penalty)
For multi-target results with the correct methodology (frozen params, eigenmatrix + geometry refs), see Rh-enamide benchmark results.
How to reproduce this¶
The Check 1 evaluation is an automated integration test. It loads the published force field, evaluates it against the QM reference data, and compares the results to a saved snapshot (a JSON file containing the expected per-molecule metrics). If the results change — because of a code fix, a bug, or a parameter reinterpretation — the test fails, which is the point: it guards against silent regressions.
# Run the evaluation (requires JAX or OpenMM)
python3 -m pytest test/integration/test_published_ff_validation.py --run-validation -v
# Update the saved snapshot after a verified change
Q2MM_UPDATE_GOLDEN=1 python3 -m pytest test/integration/test_published_ff_validation.py --run-validation -v
Optimization Comparison¶
q2mm v5 achieves lower frequency RMSD than the published FFs when both are evaluated under JaxEngine. However, the published FFs were optimized for a different engine (MacroModel MM3*) and a broader objective (geometries + Hessian + charges + energies). On the papers' own metrics (eigenvalue R², selectivity predictions), the published FFs perform well. See the QFUERZA starting-point quality and the merged system pages for detailed, honest comparisons:
- Rh-enamide — published eigenvalue R² ≈ 0.998; our QFUERZA R² = 0.991
- Heck relay — published selectivity RMSD = 2.3 kJ/mol
- Pd-allyl — published selectivity MUE = 4.4 kJ/mol (77 predictions)
- Pd 1,4-conjugate — published selectivity MUE = 1.8 kJ/mol (82 predictions)
- Rh 1,4-conjugate — published bisphosphine MUE = 4.1 kJ/mol (67 structures)
Where things live:
- The published force field came from the
Q2MM/q2mm
repository and is stored at
examples/rh-enamide/ff/rh_hyd_enamide_final.fld. - The saved snapshot is at
test/fixtures/published_ff/rh_enamide_donoghue2008.json. - The test itself is
test/integration/test_published_ff_validation.py. - Provenance notes (where the force field came from and how it was
extracted) are in
validation/published_ffs/README.md.