Skip to content

Optimization Guide

Force field optimization is hard because the objective landscape — the surface defined by how well MM frequencies match QM reference data — is rarely smooth. MM3 landscapes are rugged with many local minima (Buckingham potentials, coupled torsions). Harmonic landscapes are smoother but still non-trivial from a poor starting point. No single optimizer handles both cases well.

This guide tells you what to do for your system. It recommends concrete workflows grounded in CH₃F benchmark evidence, then links to source code for API details.


Quick-start: which workflow?

Your system Recommended workflow Why
≤ 10 params, harmonic form Workflow A L-BFGS-B alone reaches the best basin (529 cm⁻¹ RMSD on CH₃F)
≤ 10 params, MM3 form Workflow B Multi-start finds basins single-start misses (28.7 vs 579 RMSD)
10–50 params, any form Workflow C Grad-simp cycling combines gradient speed with simplex robustness
50+ params Workflow C + L2 Regularization prevents parameter drift in under-determined systems
JAX engine, multi-molecule TS systems Workflow D Per-molecule JaxLoss analytical gradients via scipy L-BFGS-B

Every workflow assumes QFUERZA initialization — run qfuerza_fresh() (single molecule) or qfuerza_into() (template-based, multi-molecule averaging) before optimization. QFUERZA puts you in the right neighbourhood; the optimizer refines from there.


Workflow A: Small + Smooth

When: ≤ 10 parameters, harmonic functional form, analytical gradients available.

Recipe: One call to L-BFGS-B with analytical gradients. Done.

from q2mm.optimizers.scipy_opt import ScipyOptimizer

optimizer = ScipyOptimizer(method="L-BFGS-B", maxiter=500, jac="auto")
result = optimizer.optimize(objective)
print(result.summary())

Why this works: On smooth harmonic landscapes, L-BFGS-B's curvature information reaches the best basin in the fewest evaluations. On CH₃F harmonic, L-BFGS-B with analytical gradients achieves 529 cm⁻¹ RMSD in ~2 seconds. Basin-hopping, multi-start, and optax Adam all match or slightly beat this (~526–531), but none find a meaningfully better basin — the harmonic landscape simply doesn't have hidden minima worth searching for.

What doesn't work here:

  • Optax Adam (990–1001 RMSD) — adaptive learning rates can't exploit curvature the way L-BFGS-B can on a smooth surface
  • L2 regularization — can hurt well-conditioned problems (see L2 Regularization).
  • FD-only gradients (1048 RMSD) — finite-difference frequency gradients are too noisy to guide L-BFGS-B from the QFUERZA starting point

Workflow B: Small + Rugged

When: ≤ 10 parameters, MM3 or other non-harmonic functional form.

Recipe: Multi-start L-BFGS-B to find the best basin.

from q2mm.optimizers.multistart import MultiStartOptimizer
from q2mm.optimizers.scipy_opt import ScipyOptimizer

# Multi-start global search
inner = ScipyOptimizer(method="L-BFGS-B", maxiter=500, jac="auto")
multi = MultiStartOptimizer(
    optimizer=inner,
    n_starts=10,
    perturbation_pct=0.1,
    seed=42,
)
result = multi.optimize(objective)

Why this works: The MM3 landscape has many local minima. Single-start L-BFGS-B gets trapped at 579 cm⁻¹ RMSD on CH₃F — a poor basin. Running 10 starts from perturbed initial points finds basins that single-start misses:

Strategy CH₃F MM3 RMSD Notes
L-BFGS-B (single start) 579.0 Stuck in poor local minimum
L-BFGS-B + L2(λ=0.01) 133.5 Stabilized with L2 regularization
optax Adam 56.3 Momentum navigates rugged landscape
multi-start n=10 (OpenMM) 28.7–46.2 Best strategy — stochastic, varies by run

Multi-start is the recommended approach because it's simple and consistently finds the best basins on rugged landscapes.

Does Adam refinement help after multi-start? In composed benchmarks, running optax Adam from the multi-start winner barely changed the result: 46.2 → 46.1 on OpenMM (FD gradients limit Adam), 563.8 → 563.8 on JAX (same local minimum). Multi-start alone finds the basin; refinement adds diminishing returns.

Alternatives that also work:

  • Basin-hopping with T=0.5 improved on single-start L-BFGS-B (514 vs 579) but didn't match multi-start n=10 (28.7). Basin-hopping is more sensitive to temperature tuning — T=1.0 produced the worst result (1105 RMSD) by accepting too many uphill moves.
  • L2 regularization can stabilize rugged single-start fits when you cannot afford multi-start (see L2 Regularization).

What doesn't work here:

  • Multi-start on JAX (579–586 RMSD) — JAX's analytical gradients consistently converge to the same local minimum regardless of starting point. The OpenMM FD gradient noise actually helps escape this basin. This is a case where FD noise acts as implicit regularization.

Workflow C: Medium and Large Systems

When: 10+ parameters, any functional form. This is the production workflow.

Recipe: Grad-simp cycling — alternating L-BFGS-B full-space passes with Nelder-Mead on the most sensitive parameters.

from q2mm.optimizers.cycling import OptimizationLoop

loop = OptimizationLoop(
    objective,
    max_params=3,         # simplex on top 3 params per cycle
    max_cycles=10,        # up to 10 grad-simp cycles
    convergence=0.01,     # stop when <1% improvement per cycle
    full_method="L-BFGS-B",
    simp_method="Nelder-Mead",
    full_maxiter=200,
    simp_maxiter=200,
    full_jac="auto",      # analytical gradients where available
    verbose=True,
)
result = loop.run()
print(result.summary())

How it works: Grad-simp cycling alternates between gradient-based full-space optimization and Nelder-Mead simplex refinement of the most sensitive parameters. See Theory & Methods — Stage 4 for the full mechanics. In practice, this keeps fast global progress from L-BFGS-B while simplex focuses on the few parameters gradients handle poorly.

Why this works: L-BFGS-B quickly converges most parameters, then Nelder-Mead polishes the stubborn ones that gradients can't handle. On Rh-enamide (182 parameters, MM3), grad-simp achieves the best observed score (3.29 with OpenMM) — better than either L-BFGS-B (5.81) or Nelder-Mead (5.11) alone.

Why only 3 parameters per simplex pass?

Nelder-Mead creates an (N+1)-vertex simplex. With 3 parameters that's 4 vertices; with 20 it's 21 — and convergence slows significantly. max_params=3 keeps simplex passes fast while addressing the most problematic parameters each cycle.

You can use multi-start as the gradient phase inside grad-simp cycling via the full_method="multi:L-BFGS-B" parameter:

from q2mm.optimizers.cycling import OptimizationLoop

loop = OptimizationLoop(
    objective,
    max_params=3,
    max_cycles=5,
    full_method="multi:L-BFGS-B",  # multi-start each gradient phase
    simp_method="Nelder-Mead",
    full_jac="auto",
)
result = loop.run()

Evidence: this composition doesn't outperform plain cycling

In CH₃F benchmarks, grad-simp with multi-start inner achieved 592 RMSD on OpenMM and 527 RMSD on JAX — comparable to or worse than plain grad-simp (586 / 579). The random restarts within each cycle disrupt inter-cycle convergence. For rugged landscapes, multi-start alone (Workflow B) is more effective than embedding multi-start inside cycling.

Adding L2 regularization to cycling

For under-determined systems (more parameters than independent observations), add L2 to the objective:

objective = ObjectiveFunction(
    forcefield=ff,
    engine=engine,
    molecules=molecules,
    reference=reference,
    regularization=0.01,   # keeps params near QFUERZA values
)

loop = OptimizationLoop(objective, max_params=3, max_cycles=10)
result = loop.run()

L2 regularization can stabilize under-determined cycling runs (see L2 Regularization).


Workflow D: End-to-End Differentiable (JAX)

When: JAX engine with multi-molecule TS systems, eigenmatrix or geometry references — you want analytical gradients without finite-difference overhead.

Recipe: Use ScipyOptimizer with jac="auto" — it auto-detects JaxEngine and routes gradients through JaxLoss.

from q2mm.optimizers.scipy_opt import ScipyOptimizer

optimizer = ScipyOptimizer(method="L-BFGS-B", maxiter=200, jac="auto")
result = optimizer.optimize(objective)
print(result.summary())

How it works: When the engine is a JaxEngine and jac="auto", ScipyOptimizer builds a JaxLoss — a collection of per-molecule JIT-compiled loss+gradient functions. Each molecule's Hessian + eigenmatrix + energy computation is compiled into its own small XLA program. Scipy calls this from Python at each iteration (treating it as a black-box function returning (loss, grad)), so no single XLA program needs to contain all molecules.

Supported reference types:

Reference type Supported Notes
Energy Weighted residuals
Frequency Closed-form sensitivity via eigenvalue derivatives
Hessian elements Raw Cartesian Hessian entries
Eigenmatrix (Q^T H Q) Diagonal and off-diagonal projection terms
Geometry Bond lengths, angles, torsions via jaxopt.LBFGS(implicit_diff=True) inner minimization

When to use this vs SciPy FD:

  • JaxLoss provides analytical gradients for all reference types including geometry — no finite-difference overhead.
  • First evaluation is slow (~5 min) due to per-molecule JIT compilation. Subsequent evaluations are fast (~7 s for 9 molecules).
  • SciPy L-BFGS-B with FD gradients works with any engine but is O(n_params) evaluations per step.

Benchmark results (CH₃F, see Small Molecules):

Form Device Optimizer RMSD (cm⁻¹) Time eval/s
harmonic CPU jaxopt:lbfgsb 528.3 4.8s 45.4
harmonic GPU jaxopt:lbfgs 532.0 16.3s 9.9
harmonic GPU SciPy L-BFGS-B (A) 528.7 1.9s 41.1
mm3 GPU jaxopt:lbfgs 578.7 16.2s 18.3
mm3 GPU SciPy L-BFGS-B (A) 579.0 2.2s 31.4

CH₃F is a single-molecule system

On single-molecule systems, JaxOptOptimizer with monolithic JIT compilation still works well (no OOM risk). For multi-molecule TS systems, use ScipyOptimizer(jac="auto") which routes through the per-molecule JaxLoss dispatch.


Modifiers

These cross-cutting options layer onto any workflow.

L2 Regularization

Penalizes parameter drift from the starting values (QFUERZA estimates). The total loss becomes:

\[\text{loss}_\text{total} = \text{loss}_\text{data} + \lambda \cdot \| \mathbf{p} - \mathbf{p}_\text{ref} \|^2\]
objective = ObjectiveFunction(
    forcefield=ff, engine=engine,
    molecules=molecules, reference=reference,
    regularization=0.01,      # λ — penalty strength
    # reference_params=...    # defaults to initial FF params
)

When it helps: Rugged MM3 landscapes where single-start optimizers find poor local minima. On CH₃F MM3, L2 improved L-BFGS-B by 4× (579 → 134 RMSD). Also useful for under-determined systems (more parameters than observations).

When it hurts: Well-conditioned problems. On CH₃F harmonic, L2 doubled the RMSD (529 → 993) by preventing parameters from reaching the optimal basin.

Choosing λ

Start with 0.001–0.01. Increase until parameters stay close to QFUERZA values. Aim for the penalty term to be ~1–10% of data loss at the optimum. Too large = can't improve; too small = no effect.

L2 works with every optimizer — SciPy, optax, basin-hopping, multi-start, and grad-simp — because it modifies the ObjectiveFunction, not the optimizer.

Sensitivity Analysis

A diagnostic tool that ranks parameters by how the objective responds to perturbation, using the ratio simp_var = d2/d1². Low simp_var means the parameter strongly affects the objective relative to its curvature — these are parameters where simplex outperforms gradient methods.

from q2mm.optimizers.cycling import compute_sensitivity

sens = compute_sensitivity(objective, metric="simp_var")
labels = ff.get_param_type_labels()
for rank, idx in enumerate(sens.ranking):
    print(f"  {rank+1}. {labels[idx]:12s}  "
          f"d1={sens.d1[idx]:+.4f}  simp_var={sens.simp_var[idx]:.4f}")

Use this to understand your landscape before choosing a workflow, or to debug why optimization stalls.

Cost

Sensitivity analysis requires 2N + 1 objective evaluations in the worst case (one baseline plus two perturbations per parameter).


Optimizer Reference

For constructor parameters, return types, and full API details, see the API Reference.

Gradient modes

All gradient-using optimizers support three modes via the jac parameter:

Mode What it does When to use
jac=None SciPy finite-difference Default; works everywhere but noisy
jac="auto" Analytical where available, FD fallback Recommended — best quality gradients
jac="analytical" Forces analytical only Only if all evaluators support it

Analytical gradients produce the best results on harmonic problems. The top harmonic results on CH₃F all use analytical or auto mode.


Tips and Pitfalls

L-BFGS-B may not fully converge on rugged landscapes

On CH₃F MM3, L-BFGS-B gets trapped at 579 cm⁻¹ RMSD — a poor local minimum. Use multi-start or optax Adam on MM3 forms. On smooth harmonic forms, L-BFGS-B is the best choice.

QFUERZA initialization matters

Starting from QFUERZA-estimated parameters puts you close to the optimum. Run qfuerza_fresh() or qfuerza_into() before optimization when QM data is available.

Monitor convergence

Plot result.history (single-shot) or result.cycle_scores (cycling) to visualize convergence. If the score plateaus early, the optimizer may be stuck — try multi-start or switch the sensitivity metric to "abs_d1".

Backend speed

Per-evaluation cost on CH₃F (8 parameters), from derivative-free methods:

Backend Per-eval vs Tinker
JAX (GPU) ~2.5 ms 96× faster
OpenMM (GPU) ~10 ms 24× faster
Tinker (CPU) ~240 ms baseline

FD noise as implicit regularization

On CH₃F MM3, OpenMM multi-start n=10 (FD gradients) achieved 28.7 RMSD while JAX multi-start n=10 (analytical gradients) stayed at 586. The FD gradient noise helped OpenMM escape the local minimum that JAX's precise gradients consistently converge to. This is an unusual case where lower-quality gradients produced a better result.


GPU Optimizer Recommendations

For multi-molecule TS systems with the multi-target objective (eigenmatrix + geometry, frozen base-FF params):

Optimizer Use case Notes
ScipyOptimizer L-BFGS-B (jac="auto") Default for TS systems Auto-routes through JaxLoss analytical gradients. ~8 s/eval on 9-molecule Rh-enamide (GPU).
JaxOptOptimizer L-BFGS Single-molecule systems Monolithic JIT is fast for small systems; pathologically slow on multi-molecule due to internal line search compilation.
OptaxOptimizer Adam Exploration First-order; useful for rugged landscapes where L-BFGS-B gets stuck.

Start with ScipyOptimizer

Use ScipyOptimizer(method="L-BFGS-B", jac="auto") as the default for JAX-engine workflows. It auto-detects JaxEngine, builds per-molecule JaxLoss functions, and provides analytical gradients to scipy's battle-tested L-BFGS-B implementation.

See Optimizer Comparison for detailed results and timing data.


Further Reading