Optimizer Comparison¶

What this page answers¶

This page compares q2mm's current production optimizer path on five transition-state force-field systems from the Q2MM literature. The question is not "can we reproduce MacroModel MM3 exactly?" The answer to that is no: the published TSFFs were optimized under MacroModel-specific MM3 semantics, and q2mm does not include a licensed MacroModel compatibility layer. The question here is narrower and testable:

Given the published OPT-substructure parameters as a starting point, can q2mm's JAX engine and analytical-gradient optimizer reduce q2mm's own multi-target objective without corrupting the force field?

For four of the five systems the answer is yes. Pd-allyl is the exception: it passes the surrogate-ratio gate, but the published Wahlers parameters already sit at a local minimum for the current q2mm objective.

Methodology¶

All multi-target benchmarks use the same production setup:

Objective: eigenmatrix-diagonal + geometry references built by ReferenceData.from_molecules() with invert_ts_curvature=True.
Parameter scope: frozen base force field; only OPT-substructure parameters are active, matching the published Q2MM workflow.
Starting force field: the literature OPT values are preserved as published (starting_point="published"). The loader does not overwrite them with QFUERZA projections. This page is the published-start baseline; for the canonical QFUERZA-start results (default since q2mm#290) see the QFUERZA-recovery doc.
Optimizer: SciPy L-BFGS-B with jac="auto".
Gradient source: jac="auto" resolves to JaxLoss analytical gradients when the JaxLoss/ObjectiveFunction ratio check is within the default ±15% band.
Validation: the real Python ObjectiveFunction is evaluated before and after the JaxLoss-guided optimization. For noisy systems, the reported improvement is the mean over 10 initial and 10 final evaluations with a 95% confidence interval.

The raw JSON outputs and optimized force fields for these published-start runs live in ericchansen/q2mm-data/benchmarks/<system>/from-published/. They include provenance such as q2mm git SHA, device, ratio tolerance, and run timestamp. (Sibling convergence/ directories hold the canonical QFUERZA-start runs covered by the QFUERZA-recovery doc.)

Surrogate ratio gate¶

Before using JaxLoss gradients, q2mm compares the JaxLoss value with the real ObjectiveFunction value. Ratios inside the default [0.85, 1.15] band are accepted; outside the band, the analytical surrogate is considered unreliable for that parameter regime.

After the loader API refactor and the MM3 angle-gradient fix, every system in this table is inside the default band.

System	Mols	Active params	Ratio	Gate
Rh-enamide	9	182	1.07	✓
Heck relay	23	462	1.085	✓
Pd-allyl	21	482	1.091	✓
Pd 1,4-conj	10	340	0.985	✓
Rh 1,4-conj	10	488	0.996	✓

Two fixes changed the interpretation of this table:

Loader API refactor: published OPT values are now used as published; QFUERZA no longer silently overwrites them during system loading.
MM3 angle-gradient fix: the JAX angle term now uses a custom-VJP atan2-based angle function instead of gradient-killing arccos(clip()) near collinear geometries.

Heck relay is the clearest example: its ratio moved from outside the default band to 1.085 after the angle-gradient fix, and JaxLoss-guided optimization now transfers to the real objective.

Optimization results¶

System	Initial score	Final score	Mean Δ	95% CI on Δ	L-BFGS-B iters	Real OF evals	Wall time
Rh-enamide	4.885 × 10⁵	2.700 × 10⁵	−44.73%	±0.29%	13	2	710 s opt + post-evals
Heck relay	3.098 × 10⁶	1.461 × 10⁶	−52.82%	±1.54%	7	2	1,825 s opt + post-evals
Pd-allyl	8.036 × 10⁶	8.037 × 10⁶	−0.010%	±0.40%	2	2	1,289 s opt + post-evals
Pd 1,4-conj	8.608 × 10⁶	7.235 × 10⁶	−15.96%	not sampled	3	2	700 s
Rh 1,4-conj	6.293 × 10⁶	5.160 × 10⁶	−18.00%	±4.17%	4	2	691 s opt + post-evals

Score and CI values come from benchmarks/<system>/from-published/validation_results.json in ericchansen/q2mm-data (refreshed under #288 / q2mm-data#10 after the MM3 angle-gradient fix; the canonical/opt-out subdir rename in q2mm-data#11 moved these published-start files from convergence/ to from-published/). 95% CI on Δ is the conservative bound (initial_obj_score_ci95 + final_obj_score_ci95) / initial_obj_score_mean × 100 — the same combination used by the JSON's improvement_significant flag. Rh-enamide and ch3f were re-evaluated with --n-evals 5; the others with --n-evals 10. Pd 1,4-conj is a single-call run (no CI sampled).

Interpretation:

Rh-enamide, Heck relay, Pd 1,4-conj, and Rh 1,4-conj improve substantially under the q2mm JAX objective.
Pd-allyl does not improve in a statistically meaningful way. The optimizer converges quickly, the ratio gate is healthy, and the 10-sample confidence interval excludes any hidden >0.4% improvement. This is a local minimum of the current objective, not a failed run.
Small L-BFGS-B iteration counts are expected. In the JaxLoss path, SciPy evaluates the surrogate many times internally; the real ObjectiveFunction is called only for the initial baseline and final validation.

Per-category fit after optimization¶

The objective combines geometry references and eigenmatrix-diagonal references. R² is reported by category so geometry improvements are not hidden by the much larger eigenmatrix term.

System	R²(bond_length)	R²(bond_angle)	R²(eig_diag)	Takeaway
Rh-enamide	0.989	0.954	0.968	Strong fit across all target classes
Heck relay	0.983	0.909	−14.28	Geometry excellent; eigenmatrix gap remains
Pd-allyl	0.046	0.331	−2.82	Published values are a q2mm local minimum but not a good transfer fit
Pd 1,4-conj	0.950	0.037	−9.642	Bond geometry strong; eigenmatrix gap remains
Rh 1,4-conj	0.822	0.540	−12.85	Real objective improves; eigenmatrix gap remains

These R² values should not be read as claims about the original papers' performance. The papers used MacroModel MM3* and often the full lower-triangle eigenmatrix, charges, and/or selectivity validation. The table reports how the same published OPT values and q2mm-optimized descendants behave under q2mm's current JAX engine and objective.

MacroModel MM3* transfer boundary¶

The published TSFFs remain scientifically valid in their original setting, but several do not transfer their internal Hessian/eigenmatrix quality into q2mm's JAX engine. This is not a release blocker for q2mm because exact MacroModel MM3* reproduction is outside the current alpha scope.

Known transfer gaps include:

metal-center torsion behavior that may be suppressed or attenuated by MacroModel-specific rules,
wildcard MM3 atom-type matching such as 00,
cross terms beyond the currently implemented JAX stretch-bend term,
composed-force-field semantics for base MM3 + OPT overlays,
the absence of a licensed MacroModel validation loop for confirming any compatibility-layer guesses.

q2mm's supported path is therefore:

load the published or QFUERZA starting force field without corrupting it,
freeze non-OPT parameters when reproducing literature-scale TS systems,
optimize under the q2mm engine/objective being used,
report the remaining cross-engine gap honestly.

Recommendations¶

Use scipy-lbfgsb-jax on the CLI or ScipyOptimizer(method="L-BFGS-B", jac="auto") in Python for multi-molecule TS systems.
Keep the default ratio gate enabled. It now admits all five benchmark systems after the loader and angle-gradient fixes, and it remains useful as a guard against future surrogate/objective divergence.
Do not use JaxOptOptimizer as the default for multi-molecule TS systems. Its monolithic optimization path is useful on small systems, but the per-molecule JaxLoss + SciPy L-BFGS-B path is the production route for the literature-scale benchmarks.
Do not treat failure to beat a MacroModel-published FF under q2mm as a bug by itself. Treat it as evidence of the documented MM3* transfer boundary unless a q2mm-native invariant or parity test fails.

Reproduce¶

# Full convergence regeneration for all systems; writes results under results/
python scripts/regenerate_convergence_results.py

# Example: statistically sampled pd-allyl verdict
python scripts/regenerate_convergence_results.py --system pd-allyl --n-evals 10

Archive any result JSON or optimized force field used in documentation in the separate q2mm-data repository; local results/ output is intentionally gitignored in this code repo.