Small Molecules¶

This page answers one question: how do the currently supported backend, form, and optimizer combinations compare on a small, fully tractable benchmark? The system is CH₃F (5 atoms, 8 fitted parameters) against B3LYP/6-31+G(d) QM frequencies. Unlike the Rh-Enamide page, this page is the full supported matrix, so it is the right place to compare combinations directly.

Scope¶

System: CH₃F (1 molecule, 5 atoms, 8 parameters)
QM reference: B3LYP/6-31+G(d)
Matrix size: 82 supported combos (77 single-shot + 5 composed)
Backends/forms: JAX and OpenMM on harmonic + MM3, JAX-MD on harmonic, Tinker on MM3
Optimizers: Powell, L-BFGS-B, Nelder-Mead, grad-simp, optax (Adam, AdaGrad, SGD), jaxopt (L-BFGS, L-BFGS-B), basin-hopping (T=1.0, T=0.5), multi-start (n=5, n=10), and L2-regularized variants — each gradient-using optimizer is run twice (once with analytical frequency gradients, once with pure FD); optax optimizers use analytical gradients only (JAX backend); jaxopt optimizers use JIT-compiled analytical gradients (JAX backend only); global optimizers (basin-hopping, multi-start) and L2 variants run on fast GPU backends only (JAX, JAX-MD, OpenMM CUDA); composed workflows (multi-start → Adam, grad-simp with multi-start inner) on MM3 only
Starting point: QFUERZA initialization — JAX/JAX-MD begin at 192.0 cm⁻¹ RMSD, OpenMM at 191.9 cm⁻¹, Tinker at 192.1 cm⁻¹

Full CH₃F matrix¶

Default rows are grouped by functional form and then by final RMSD. Use the filters and sortable headers to narrow form/backend/device/optimizer combinations, and compare like-with-like inside each form: harmonic and MM3 rows share the same benchmark system, but they do not represent the same force-field model.

Form	Backend	Device	Optimizer	F∇	RMSD	MAE	Time	eval/s
harmonic	JAX-MD	GPU	multi:L-BFGS-B (n=5)	FD	525.9	241.5	20.6 s	19.8
harmonic	JAX	CPU	jaxopt:lbfgsb	A	528.3	235.4	4.8 s	45.4
harmonic	JAX	GPU	L-BFGS-B	A	528.7	257.3	1.9 s	41.1
harmonic	JAX-MD	GPU	grad-simp	FD	528.8	242.3	5.9 s	142.5
harmonic	JAX	GPU	grad-simp	A	529.1	243.3	5.5 s	243.1
harmonic	JAX	GPU	multi:L-BFGS-B (n=10)	A	529.5	246.2	7.9 s	125.7
harmonic	JAX	GPU	basinhopping (T=0.5)	A	530.7	253.2	5.3 s	117.5
harmonic	JAX	GPU	basinhopping (T=1.0)	A	530.9	253.3	6.4 s	117.3
harmonic	JAX-MD	GPU	L-BFGS-B	FD	531.1	254.6	4.3 s	20.2
harmonic	JAX-MD	GPU	basinhopping (T=0.5)	FD	531.2	254.1	31.0 s	19.6
harmonic	JAX-MD	GPU	basinhopping (T=1.0)	FD	531.3	255.4	31.3 s	19.5
harmonic	JAX	GPU	multi:L-BFGS-B (n=5)	A	531.7	254.2	4.3 s	106.9
harmonic	JAX	CPU	jaxopt:lbfgs	A	531.9	254.6	6.2 s	30.3
harmonic	JAX	GPU	jaxopt:lbfgs	A	532.0	254.8	16.3 s	9.9
harmonic	OpenMM	GPU	grad-simp	FD	979.5	786.3	45.8 s	91.5
harmonic	JAX	GPU	grad-simp	FD	981.4	790.1	13.1 s	353.4
harmonic	JAX-MD	GPU	grad-simp	FD	981.4	790.1	13.8 s	334.7
harmonic	OpenMM	GPU	grad-simp	FD	981.9	794.7	67.8 s	31.0
harmonic	OpenMM	GPU	multi:L-BFGS-B (n=5)	FD	983.4	837.3	34.1 s	6.3
harmonic	OpenMM	GPU	multi:L-BFGS-B (n=10)	FD	985.5	836.1	64.5 s	6.4
harmonic	JAX	GPU	Nelder-Mead	—	987.4	795.0	34.2 s	357.7
harmonic	JAX-MD	GPU	Nelder-Mead	—	987.5	795.0	34.2 s	344.0
harmonic	JAX	GPU	optax:sgd	A	990.0	838.0	18.2 s	109.9
harmonic	JAX	GPU	L-BFGS-B + L2(λ=0.01)	A	993.3	852.5	1.3 s	20.3
harmonic	JAX-MD	GPU	L-BFGS-B + L2(λ=0.01)	FD	993.3	852.5	1.3 s	20.6
harmonic	JAX	GPU	optax:adam + L2(λ=0.01)	A	993.4	852.5	6.0 s	67.5
harmonic	JAX-MD	GPU	multi:L-BFGS-B (n=10)	FD	994.6	815.9	48.6 s	19.6
harmonic	OpenMM	GPU	L-BFGS-B + L2(λ=0.01)	FD	995.8	857.6	19.5 s	6.5
harmonic	JAX	GPU	optax:adam+cosine	A	999.4	831.7	29.5 s	67.7
harmonic	JAX	GPU	optax:adam	A	1000.4	831.4	23.4 s	85.4
harmonic	JAX	GPU	optax:adagrad	A	1000.9	868.0	19.7 s	101.5
harmonic	OpenMM	GPU	basinhopping (T=0.5)	FD	1021.5	857.4	155.1 s	6.1
harmonic	OpenMM	GPU	Powell	—	1036.7	891.7	62.5 s	97.8
harmonic	OpenMM	GPU	basinhopping (T=1.0)	FD	1041.4	872.4	140.0 s	6.1
harmonic	JAX	GPU	Powell	—	1041.5	899.0	10.1 s	342.8
harmonic	JAX-MD	GPU	Powell	—	1041.5	899.0	10.4 s	342.1
harmonic	OpenMM	GPU	Nelder-Mead	—	1043.6	868.8	9.2 s	102.7
harmonic	JAX	GPU	L-BFGS-B	FD	1048.3	934.6	0.5 s	336.0
harmonic	JAX-MD	GPU	L-BFGS-B	FD	1048.3	934.6	0.5 s	334.8
harmonic	OpenMM	GPU	L-BFGS-B	FD	1048.3	934.7	3.5 s	77.2
harmonic	OpenMM	GPU	L-BFGS-B	FD	1049.5	936.1	4.0 s	5.5
mm3	OpenMM	GPU	multi:L-BFGS-B (n=10)	FD	28.7	20.2	157.4 s	6.4
mm3	JAX	GPU	optax:adam	A	56.3	44.0	25.2 s	79.4
mm3	OpenMM	GPU	L-BFGS-B	FD	59.5	46.7	4.8 s	104.4
mm3	JAX	GPU	optax:adam+cosine	A	60.6	42.7	29.8 s	67.2
mm3	OpenMM	GPU	L-BFGS-B	FD	83.6	62.9	9.7 s	5.4
mm3	Tinker	CPU	L-BFGS-B	FD	83.8	63.4	152.5 s	4.1
mm3	Tinker	CPU	L-BFGS-B	FD	83.8	63.4	150.3 s	4.2
mm3	JAX	GPU	L-BFGS-B	FD	113.5	90.6	0.8 s	347.2
mm3	OpenMM	GPU	L-BFGS-B + L2(λ=0.01)	FD	133.3	108.8	12.7 s	6.3
mm3	JAX	GPU	L-BFGS-B + L2(λ=0.01)	A	133.5	109.5	1.8 s	20.4
mm3	JAX	GPU	optax:adam + L2(λ=0.01)	A	133.5	109.5	5.4 s	56.8
mm3	JAX	GPU	optax:adagrad	A	138.0	113.5	20.0 s	100.0
mm3	JAX	GPU	optax:sgd	A	192.0	177.5	1.7 s	12.7
mm3	OpenMM	GPU	basinhopping (T=0.5)	FD	513.8	263.9	179.3 s	6.3
mm3	Tinker	CPU	Powell	—	542.5	275.2	2768.6 s	4.3
mm3	JAX	GPU	multi:L-BFGS-B (n=10) → optax:adam	A	563.8	—	24.7 s	968
mm3	Tinker	CPU	grad-simp	FD	564.4	314.5	1094.9 s	4.3
mm3	Tinker	CPU	grad-simp	FD	564.4	314.5	1097.7 s	4.3
mm3	OpenMM	GPU	grad-simp	FD	566.2	306.6	36.1 s	24.7
mm3	OpenMM	GPU	grad-simp	FD	573.1	311.6	29.5 s	97.1
mm3	Tinker	CPU	Nelder-Mead	—	576.3	311.5	152.5 s	4.3
mm3	OpenMM	GPU	multi:L-BFGS-B (n=5)	FD	578.1	341.1	59.8 s	6.6
mm3	JAX	GPU	jaxopt:lbfgs	A	578.7	312.6	16.2 s	18.3
mm3	JAX	GPU	L-BFGS-B	A	579.0	313.9	2.2 s	31.4
mm3	JAX	GPU	grad-simp	A	579.0	313.9	3.4 s	139.1
mm3	JAX	GPU	multi:L-BFGS-B (n=5)	A	579.0	313.9	4.6 s	92.3
mm3	JAX	GPU	basinhopping (T=0.5)	A	579.0	313.9	10.7 s	125.4
mm3	JAX	CPU	jaxopt:lbfgs	A	579.1	312.9	6.3 s	45.6
mm3	JAX	CPU	jaxopt:lbfgsb	A	579.5	313.3	6.8 s	74.0
mm3	OpenMM	GPU	Nelder-Mead	—	581.1	315.1	8.5 s	97.0
mm3	JAX	GPU	multi:L-BFGS-B (n=10)	A	586.3	319.6	7.5 s	112.3
mm3	JAX	GPU	Nelder-Mead	—	608.1	334.2	25.6 s	344.9
mm3	OpenMM	GPU	basinhopping (T=1.0)	FD	842.6	636.2	168.0 s	6.3
mm3	JAX	GPU	grad-simp	FD	1050.0	910.4	8.2 s	343.0
mm3	JAX	GPU	Powell	—	1080.7	937.3	15.1 s	339.0
mm3	OpenMM	GPU	Powell	—	1090.5	950.4	124.2 s	95.3
mm3	JAX	GPU	basinhopping (T=1.0)	A	1105.0	978.2	11.7 s	122.1

Composed workflows¶

Composed workflows chain two optimizers in sequence (or embed one inside another). They are run on MM3 only — the harmonic landscape is too smooth for staged refinement to add value.

Form	Backend	Device	Optimizer	F∇	RMSD	MAE	Time	eval/s
mm3	OpenMM	GPU	multi:L-BFGS-B (n=10) → optax:adam	FD	46.1	—	604.0 s	1084
mm3	JAX	GPU	grad-simp (multi:L-BFGS-B inner)	A	526.7	238.2	31.6 s	8970
mm3	JAX	GPU	multi:L-BFGS-B (n=10) → optax:adam	A	563.8	—	24.7 s	968
mm3	OpenMM	GPU	grad-simp (multi:L-BFGS-B inner)	FD	592.1	341.1	450.2 s	23586

See Composed workflows analysis below.

Interpretation¶

RMSD and MAE are in cm⁻¹ (frequency error vs QM reference). F∇ = frequency gradient mode: A = analytical (autodiff), FD = finite-difference, — = not applicable (derivative-free optimizer). The energy gradient column (E∇) is omitted because CH₃F benchmarks optimize on frequency data only.

Harmonic form¶

The best harmonic results cluster around 526–531 cm⁻¹ RMSD, achieved by JAX, JAX-MD, and OpenMM with L-BFGS-B, grad-simp, multi-start, or basin-hopping using analytical frequency gradients. These combos benefit from QFUERZA's physically motivated starting parameters.
Multi-start and basin-hopping match plain L-BFGS-B on the harmonic form (526–531 RMSD range). The harmonic landscape has fewer local minima, so random restarts and stochastic perturbations do not discover better basins than the QFUERZA starting point provides.
L2 regularization hurts harmonic performance (993 cm⁻¹ vs 529 for unregularized L-BFGS-B). The penalty prevents parameters from reaching the deep basin that L-BFGS-B normally finds. L2 is counterproductive when the landscape is well-conditioned.
Optax optimizers (Adam, AdaGrad, SGD) perform poorly on the harmonic form (990–1001 cm⁻¹), comparable to derivative-free methods. The harmonic landscape from the QFUERZA starting point appears to favour quasi-Newton methods (L-BFGS-B) that use curvature information.
Derivative-free optimizers (Powell, Nelder-Mead) perform poorly on the harmonic form from the QFUERZA starting point, landing in the 987–1049 range. Under the previous Seminario initialization these reached near-zero RMSD, but that was an initialization-sensitive local optimum — the result was not robust across starting points.
FD-only gradient combos (L-BFGS-B with FD) also perform poorly (~1048), suggesting that finite-difference frequency gradients are too noisy to guide L-BFGS-B from the QFUERZA basin.
JaxOpt L-BFGS matches the top harmonic cluster (528–532 cm⁻¹) using JIT-compiled analytical gradients. This confirms that end-to-end differentiation through the JAX engine produces gradients of the same quality as the optax analytical path, and that jaxopt's second-order L-BFGS method exploits them effectively. L-BFGS-B (bounded) runs on CPU only due to a jaxopt XLA compilation bug on GPU (argsort shape mismatch); the unbounded L-BFGS variant works on both CPU and GPU.

MM3 form¶

Multi-start n=10 on OpenMM achieves the best MM3 result at 28.7 cm⁻¹ RMSD — a 2× improvement over the previous best (optax Adam at 56.3) and a 20× improvement over JAX L-BFGS-B with analytical gradients (579). Running 10 independent L-BFGS-B optimizations from random starting points within the parameter bounds found a basin that no single-start optimizer reached.
L2 regularization dramatically improves L-BFGS-B on MM3: 579 → 134 cm⁻¹ (4× better), consistent across JAX and OpenMM backends. The λ=0.01 penalty prevents parameters from drifting too far from the QFUERZA starting point, steering L-BFGS-B away from the poor local minimum it normally finds. L2 with optax Adam (134) does not improve over Adam alone (56), suggesting Adam already navigates the landscape well enough.
Basin-hopping shows mixed results. OpenMM basin-hopping T=0.5 found a better basin (514 RMSD) than the default L-BFGS-B minimum (579), but T=1.0 on both JAX (1105) and OpenMM (843) accepted too many uphill moves and wandered into worse regions. Basin-hopping is sensitive to the temperature parameter and the noise level of finite-difference gradients.
Adam with cosine annealing (60.6 cm⁻¹) and AdaGrad (138.0 cm⁻¹) also outperform all scipy-based optimizers on the JAX backend.
SGD fails to improve from the starting point (192 cm⁻¹), diverging early at the default learning rate — it needs careful LR tuning.
OpenMM L-BFGS-B with FD gradients remains competitive at 59.5 cm⁻¹. The similarity between Adam (56.3) and OpenMM L-BFGS-B FD (59.5) suggests these are converging toward the same basin, but via very different paths.
Tinker L-BFGS-B improved significantly from the prior run (114 → 84 cm⁻¹), showing that QFUERZA provides a better basin for gradient-based MM3 optimization through the Tinker backend.
Powell and Nelder-Mead on MM3 remain mid-range (542–608) and are insensitive to the initialization change, as expected for derivative-free methods on a rugged landscape.
JaxOpt L-BFGS matches SciPy L-BFGS-B on MM3 (579 cm⁻¹ for both). End-to-end differentiation does not help escape the poor local minimum that L-BFGS-B finds on the rugged MM3 landscape — gradient quality is not the bottleneck here. Multi-start or Adam remain the better strategies for MM3.

Composed workflows analysis¶

Two composed strategies were benchmarked on CH₃F MM3 — the only landscape where multi-start and global search methods show material differences:

Multi-start → Adam refinement (Workflow B composition): On OpenMM CUDA, multi-start n=10 found a 46.2 RMSD basin. Running optax Adam from that result improved it to 46.1 — Adam added almost nothing. The FD gradient noise that helped multi-start find the basin limits Adam's ability to refine further. On JAX, multi-start found 563.8 and Adam left it unchanged — analytical gradients converge to the same local minimum regardless of starting approach. Verdict: Multi-start alone is sufficient; Adam refinement is not worth the extra cost.
Grad-simp with multi-start inner (Workflow C composition): Using full_method="multi:L-BFGS-B" in the cycling loop gave 592 RMSD on OpenMM (worse than plain grad-simp at 586) and 527 on JAX (marginally better than 579). The random restarts within each gradient phase disrupt the cycling algorithm's inter-cycle convergence. Verdict: For rugged landscapes, use standalone multi-start (Workflow B) rather than embedding it inside cycling.

Cross-cutting observations¶

Optimizer choice matters more than backend choice on MM3. The spread between the best (multi-start n=10, 28.7) and worst (basin-hopping T=1.0, 1105) optimizer on MM3 is 39×, while the spread between backends using the same optimizer is typically < 2×.
Global optimization strategies are problem-dependent. Multi-start n=10 found the best-ever MM3 result (28.7), but basin-hopping T=1.0 found the worst (1105). On harmonic, neither strategy improves over plain L-BFGS-B. The value of global optimization depends on how rugged the landscape is.
L2 regularization acts as a safety net, not a global optimizer. It dramatically helps L-BFGS-B on the rugged MM3 landscape (579 → 134) by preventing parameter drift, but actively hurts on the well-conditioned harmonic landscape (529 → 993). Use L2 when single-start optimizers are finding poor local minima.
Analytical frequency gradients produce the best results on the harmonic problem. The top harmonic results all use analytical frequency gradients (A) or analytical-fallback (FD with JAX-MD).
On identical parameters, JAX, JAX-MD, and OpenMM agree to machine precision when the functional form matches: energy deltas stay at or below 3 × 10⁻¹⁸ kcal/mol and frequency deltas stay below 0.001 cm⁻¹.
The optimization loop dominates runtime; QFUERZA estimation is effectively free by comparison and serves mainly as a starting point, not as the expensive step.

Artifacts and provenance¶

Inputs: QM reference data
Outputs: Benchmark results (JSON) and optimized force fields
Git SHA: 690f81d4 (q2mm 5.0.0a3 + optax integration), updated with advanced optimizers in PR #239
GPU: NVIDIA GeForce RTX 5090

This page uses the current full-matrix artifact set in benchmarks/ch3f/.

Reproducing¶

# Run full matrix (all backends, all optimizers)
q2mm-benchmark --system ch3f --output benchmarks/ch3f --platform CUDA

# Run optax optimizers only (JAX backend)
q2mm-benchmark --system ch3f --output benchmarks/ch3f --backend jax \
  --optimizer "optax:adam" "optax:adam+cosine" "optax:adagrad" "optax:sgd" \
  --learning-rate 0.01 --optax-max-steps 2000

# Run global optimizers only (fast backends)
q2mm-benchmark --system ch3f --output benchmarks/ch3f --backend jax \
  --optimizer "basinhopping (T=1.0)" "basinhopping (T=0.5)" \
  "multi:L-BFGS-B (n=5)" "multi:L-BFGS-B (n=10)" \
  --platform CUDA

# Run L2-regularized optimizers
q2mm-benchmark --system ch3f --output benchmarks/ch3f --backend jax \
  --optimizer "L-BFGS-B + L2(λ=0.01)" "optax:adam + L2(λ=0.01)" \
  --platform CUDA

# Run JaxOpt optimizers (JAX backend, end-to-end differentiable)
q2mm-benchmark --system ch3f --output benchmarks/ch3f --backend jax \
  --optimizer "jaxopt:lbfgs" "jaxopt:lbfgsb" --max-iter 500 \
  --platform CUDA

# Load and display results
q2mm-benchmark --load benchmarks/ch3f/results

# List available optimizers
q2mm-benchmark --list