Optimization Guide¶

Choosing the right optimization strategy is the single biggest lever you have over parameter quality and wall-clock time. This page explains the three strategies Q2MM provides, when to use each, and how they compare.

The Problem: Why One Method Isn't Enough¶

Force field optimization is a balancing act between exploration (searching the parameter space broadly) and exploitation (following gradients to the nearest minimum). No single optimizer excels at both:

Challenge	Gradient methods (L-BFGS-B)	Simplex methods (Nelder-Mead)
Speed	✅ Fast convergence on smooth objectives	❌ Slow for many parameters
Robustness	❌ Can get stuck in shallow features	✅ Handles noise and flat regions
Scaling	✅ Cost per step scales as O(N)	❌ Cost per step scales as O(N²)
Derivative-free	❌ Requires finite-difference gradients	✅ No gradients needed

Q2MM solves this by combining both in a GRAD→SIMP cycling loop: gradient methods handle the bulk of convergence, then simplex polishes the parameters that gradients struggle with.

Strategy 1: Single-Shot Optimizer¶

The simplest approach — run one optimizer on all parameters at once.

from q2mm.optimizers.scipy_opt import ScipyOptimizer

optimizer = ScipyOptimizer(method="L-BFGS-B", maxiter=500, eps=1e-3)
result = optimizer.optimize(objective)
print(result.summary())

When to Use¶

≤ 10 parameters — small force fields where any method converges quickly
Quick iteration — you want a fast answer, even if not fully converged
Seminario-initialized — when the starting point is already close to optimal

Available Methods¶

Method	Type	Strengths	Weaknesses
`L-BFGS-B`	Quasi-Newton	Fast, bounded, good default	Finite-difference gradients can miss shallow features
`Nelder-Mead`	Simplex	Derivative-free, robust	Slow beyond ~10 params; no bounds support
`Powell`	Direction-set	Derivative-free, handles coupling	2-3× more evaluations than Nelder-Mead
`least_squares`	Levenberg-Marquardt	Exploits residual structure	Requires many observations relative to parameters
`trust-constr`	Trust-region	Handles constraints	Slower per iteration

Start with L-BFGS-B

For most problems, L-BFGS-B with eps=1e-3 is the best starting point. It uses bounded quasi-Newton optimization and converges in 50-200 iterations for well-behaved objectives. Only switch to derivative-free methods if you see convergence issues.

Strategy 2: GRAD→SIMP Cycling (Recommended for Large Systems)¶

The flagship optimization strategy, based on the approach described in Norrby & Liljefors (J. Comput. Chem. 1998, 19, 1146). Each cycle:

Full-space gradient pass — L-BFGS-B on all N parameters
Sensitivity analysis — rank every parameter by how much the objective responds to perturbation
Subspace simplex — Nelder-Mead on only the top 3 most sensitive parameters
Convergence check — stop when improvement drops below threshold

from q2mm.optimizers.cycling import OptimizationLoop

loop = OptimizationLoop(
    objective,
    max_params=3,         # simplex on top 3 params per cycle
    max_cycles=10,        # up to 10 GRAD→SIMP cycles
    convergence=0.01,     # stop when <1% improvement per cycle
    full_method="L-BFGS-B",
    simp_method="Nelder-Mead",
    full_maxiter=200,
    simp_maxiter=200,
    verbose=True,
)

result = loop.run()
print(result.summary())

When to Use¶

10–100+ parameters — the sweet spot where single-shot simplex fails but gradients alone leave residual error
Production optimizations — when you need the best possible parameters
Transition state force fields — complex coupling between parameter types
Stubborn parameters — when L-BFGS-B converges but leaves a non-trivial residual

How Sensitivity Selection Works¶

The key innovation is derivative-based parameter selection. Before each simplex pass, Q2MM perturbs every parameter by its type-specific step size and computes:

d1 — first derivative (how steeply the objective changes)
d2 — second derivative (curvature / how quickly the slope changes)
simp_var = d2 / d1² — the selection metric

Parameters with low simp_var have high sensitivity relative to curvature. These are exactly the parameters where simplex excels: the objective responds strongly to changes (high d1) but the landscape is flat-bottomed (low d2), making gradient methods inefficient.

flowchart LR
    A[Full-space<br/>L-BFGS-B] --> B[Sensitivity<br/>Analysis]
    B --> C[Select Top N<br/>Parameters]
    C --> D[Subspace<br/>Nelder-Mead]
    D --> E{Converged?}
    E -->|No| A
    E -->|Yes| F[Done]

Why only 3 parameters?

The Nelder-Mead simplex creates N+1 vertices in N dimensions. With 3 parameters, the simplex has 4 vertices — each requiring one objective evaluation per reflection/expansion/contraction step. With 20 parameters, you'd need 21 vertices and convergence slows dramatically. Quinn et al. (2022) confirmed that "the modified simplex method has shown to have faster convergence than Raphson-type methods up to ca. 40 parameters," but the default is just 3 per cycle.

Understanding the Output¶

result = loop.run()

# Per-cycle objective scores (including initial)
print(result.cycle_scores)
# [42.9, 12.3, 4.1, 1.8, 1.7]

# Which parameters were selected for simplex each cycle
for i, indices in enumerate(result.selected_indices):
    labels = objective.forcefield.get_param_type_labels()
    selected = [labels[j] for j in indices]
    print(f"Cycle {i+1}: {selected}")
# Cycle 1: ['bond_k', 'angle_eq', 'bond_eq']
# Cycle 2: ['angle_k', 'bond_k', 'vdw_epsilon']

# Sensitivity details
for sens in result.sensitivity_results:
    print(f"d1: {sens.d1}")
    print(f"simp_var: {sens.simp_var}")

Configuration Reference¶

Parameter	Default	Description
`max_params`	`3`	Parameters per simplex pass. Increase to 4–5 for larger systems.
`max_cycles`	`10`	Maximum GRAD→SIMP iterations. Most problems converge in 3–5 cycles.
`convergence`	`0.01`	Stop when fractional improvement < this value (1% default).
`full_method`	`"L-BFGS-B"`	Scipy method for the full-space pass.
`simp_method`	`"Nelder-Mead"`	Scipy method for the subspace pass.
`full_maxiter`	`200`	Iteration budget for the gradient pass.
`simp_maxiter`	`200`	Iteration budget for the simplex pass.
`sensitivity_metric`	`"simp_var"`	`"simp_var"` (d2/d1², default) or `"abs_d1"` (rank by absolute gradient).
`eps`	`1e-3`	Finite-difference step for the full-space optimizer.

Strategy 3: Manual Subspace Optimization¶

For advanced users who want direct control over which parameters to optimise. SubspaceObjective wraps your full objective and only exposes a subset of parameters, holding the rest fixed.

from q2mm.optimizers.cycling import SubspaceObjective
from q2mm.optimizers.scipy_opt import ScipyOptimizer

# Only optimise bond force constants (indices 0 and 2)
full_vec = ff.get_param_vector()
sub_obj = SubspaceObjective(objective, [0, 2], full_vec)

# Use any scipy method on the small subspace
import scipy.optimize
result = scipy.optimize.minimize(
    sub_obj,
    sub_obj.get_initial_vector(),
    method="Nelder-Mead",
    options={"maxiter": 500},
)

# Apply optimised subspace back to the full force field
best_full = sub_obj.build_full_vector(result.x)
ff.set_param_vector(best_full)

When to Use¶

Expert parameter tuning — you know exactly which parameters need attention
Debugging — isolate whether a specific parameter type is causing issues
Custom cycling strategies — build your own outer loop with domain knowledge

Useful ForceField Helpers¶

# Get indices grouped by parameter type
indices = ff.get_param_indices_by_type()
# {'bond_k': [0, 2], 'bond_eq': [1, 3], 'angle_k': [4], ...}

# Get human-readable labels for each position in the param vector
labels = ff.get_param_type_labels()
# ['bond_k', 'bond_eq', 'bond_k', 'bond_eq', 'angle_k', 'angle_eq', ...]

# Get per-type finite difference step sizes
steps = ff.get_step_sizes()
# array([0.1, 0.02, 0.1, 0.02, 0.1, 1.0, ...])

Standalone Sensitivity Analysis¶

You can run sensitivity analysis independently, without the full GRAD→SIMP loop. This is useful for diagnosing which parameters matter most in your problem.

from q2mm.optimizers.cycling import compute_sensitivity

sens = compute_sensitivity(objective, metric="simp_var")

# Rank parameters from most to least suitable for simplex
labels = ff.get_param_type_labels()
for rank, idx in enumerate(sens.ranking):
    print(f"  {rank+1}. {labels[idx]:12s}  d1={sens.d1[idx]:+.4f}  "
          f"d2={sens.d2[idx]:.4f}  simp_var={sens.simp_var[idx]:.4f}")

Expected output:

  1. bond_k        d1=+0.3421  d2=0.0012  simp_var=0.0102
  2. angle_eq      d1=-0.1893  d2=0.0089  simp_var=0.2483
  3. bond_eq       d1=+0.0542  d2=0.0031  simp_var=1.0541
  4. angle_k       d1=-0.0103  d2=0.0002  simp_var=1.8856

Cost

Sensitivity analysis requires 2N + 1 objective evaluations (one baseline plus two perturbations per parameter). For a 20-parameter force field with OpenMM, this takes about 0.2 seconds.

Choosing a Strategy: Decision Flowchart¶

flowchart TD
    A[How many parameters?] --> B{≤ 10?}
    B -->|Yes| C[Single-shot<br/>ScipyOptimizer]
    B -->|No| D{Need best<br/>possible result?}
    D -->|Yes| E[GRAD→SIMP Cycling<br/>OptimizationLoop]
    D -->|No| F[Single-shot L-BFGS-B<br/>for quick estimate]
    C --> G{Converged well?}
    G -->|Yes| H[Done ✅]
    G -->|No| I[Try OptimizationLoop]
    E --> H
    F --> J{Good enough?}
    J -->|Yes| H
    J -->|No| E

Quick Reference¶

Scenario	Strategy	Expected Time (OpenMM)
Small FF (≤ 10 params), quick iteration	`ScipyOptimizer("L-BFGS-B")`	1–10 s
Small FF, derivative-free	`ScipyOptimizer("Nelder-Mead")`	2–30 s
Medium FF (10–40 params), production	`OptimizationLoop(max_params=3)`	30 s – 5 min
Large FF (40–100 params), production	`OptimizationLoop(max_params=4)`	2–15 min
Expert: tune specific params	`SubspaceObjective` + manual	varies
Diagnostics: which params matter?	`compute_sensitivity()`	< 1 s

Tips and Pitfalls¶

L-BFGS-B may not fully converge

With finite-difference gradients (eps=1e-3), L-BFGS-B can miss shallow features in the objective landscape. On the water test case (4 params), L-BFGS-B converges to a score of 0.93 while Nelder-Mead reaches 0.000. This is exactly why the GRAD→SIMP loop exists — the simplex pass cleans up what the gradient pass leaves behind.

Seminario initialization matters

Starting from Seminario-estimated parameters (extracted from the QM Hessian) puts you much closer to the optimum. The optimizer then needs fewer evaluations to converge. Always use estimate_force_constants() before optimization when QM data is available.

Monitor convergence

Plot result.history (for single-shot) or result.cycle_scores (for cycling) to visualize convergence. If the score plateaus early, the optimizer may be stuck — try increasing max_params or switching the sensitivity metric to "abs_d1".

OpenMM is ~150× faster than Tinker

OpenMM evaluates energies in-process (~5 ms/eval vs ~160 ms/eval for Tinker). For optimization runs that require hundreds or thousands of evaluations, this difference is dramatic. Use OpenMM when available. See the Benchmarks page for detailed benchmarks.

Optimization Guide¶

The Problem: Why One Method Isn't Enough¶

Strategy 1: Single-Shot Optimizer¶

When to Use¶

Available Methods¶

Strategy 2: GRAD→SIMP Cycling (Recommended for Large Systems)¶

When to Use¶

How Sensitivity Selection Works¶

Understanding the Output¶

Configuration Reference¶

Strategy 3: Manual Subspace Optimization¶

When to Use¶

Useful ForceField Helpers¶

Standalone Sensitivity Analysis¶

Choosing a Strategy: Decision Flowchart¶

Quick Reference¶

Tips and Pitfalls¶

Further Reading¶