Reference Data Types¶
This page documents all reference data types supported by Q2MM's
ReferenceData container. Each type represents a QM-computed observable that
the objective function compares against the corresponding MM prediction during
force field optimization.
Overview¶
| Data type | Kind string | Typical weight | Best for |
|---|---|---|---|
| Bond length | bond_length |
10.0 | Equilibrium geometry accuracy |
| Bond angle | bond_angle |
5.0 | Equilibrium geometry accuracy |
| Torsion angle | torsion_angle |
2.0 | Equilibrium dihedral geometry |
| Energy | energy |
1.0 | Relative energetics, conformer ranking |
| Frequency | frequency |
1.0 | Vibrational spectrum, force constant quality |
| Eigenvalue (diagonal) | eig_diagonal |
0.1 | Per-mode force constant accuracy |
| Eigenmatrix (off-diagonal) | eig_offdiagonal |
0.05 | Cross-coupling between modes |
All data types are stored in ReferenceData and can be combined freely.
The objective function computes weighted squared residuals:
Geometry data¶
Bonds, angles, and torsions have been standard training observables since the earliest molecular mechanics force fields (Allinger, J. Am. Chem. Soc. 1977, 99, 8127). Torsion profiles became especially important for conformational accuracy with the development of transferable force fields (Halgren, J. Comput. Chem. 1996, 17, 490).
Bond length¶
Definition: Interatomic distance between two bonded atoms.
Units: Ångströms (Å)
When to use: Always include bond lengths as baseline training data. They anchor equilibrium bond distances and are inexpensive to evaluate (no Hessian or frequency calculation needed from the MM engine).
Weight guidance: Default weight of 10.0 reflects that bond lengths are typically well-determined by QM and should be reproduced accurately. For transition states, consider lower weights for partially-formed/broken bonds where the force field may not be expected to reproduce the QM value exactly.
API:
ref.add_bond_length(
value=1.384, # QM bond length in Å
atom_indices=(0, 1), # 0-indexed atom pair
weight=10.0,
label="C-F bond",
)
Bond angle¶
Definition: Angle formed by three atoms A–B–C where B is the central (vertex) atom.
Units: Degrees (°)
When to use: Always include alongside bond lengths. Bond angles are particularly important for transition states where unusual geometries (e.g., near-linear angles at a forming bond) must be reproduced.
Weight guidance: Default weight of 5.0. Angles are less precisely determined than bond lengths and have a larger natural range of variation, so a lower weight avoids over-fitting angular parameters at the expense of other observables.
API:
ref.add_bond_angle(
value=104.52, # QM angle in degrees
atom_indices=(1, 0, 2), # vertex atom is index 1 (middle)
weight=5.0,
label="F-C-F angle",
)
Torsion angle¶
Definition: Dihedral angle formed by four atoms A–B–C–D, measuring the rotation about the B–C bond.
Units: Degrees (°)
When to use: Include when you need the force field to reproduce specific
dihedral values — for example, ensuring that a particular conformation
(anti, gauche, syn) is the equilibrium geometry. Note that torsion angles
constrain where the dihedral sits; torsion barriers (the energy cost of
rotating) are constrained by energy data from torsion scans. For
transition-state force fields, torsion terms are often zeroed
(zero_torsions=True in QFUERZA estimation) and torsion angle data may
not be needed.
Weight guidance: Default weight of 2.0. Torsion angles are soft degrees of freedom with shallow potential energy surfaces, so lower weights prevent them from dominating the fit.
API:
ref.add_torsion_angle(
value=180.0, # QM dihedral in degrees
atom_indices=(0, 1, 2, 3), # four atoms defining the dihedral
weight=2.0,
label="F-C-C-H torsion",
)
Energy data¶
Reproducing relative energies between conformers or along a reaction path tests whether the force field captures the overall shape of the potential energy surface — not just the curvature at a single point. Gundertofte et al. showed that force fields based on the MM2/MM3 functional form achieve average conformational energy errors around 0.5 kcal/mol, while simpler harmonic force fields typically exceed 1 kcal/mol (Gundertofte et al., J. Comput. Chem. 1996, 17, 429).
Energy¶
Definition: Electronic energy from a QM calculation. Typically used as relative energies between conformers or along a reaction path.
Units: kcal/mol. MMEngine.energy() returns energies in kcal/mol and
ObjectiveFunction compares QM and MM values directly with no unit conversion,
so you must convert QM energies (often reported in Hartree or kJ/mol) to
kcal/mol before adding them as reference data.
When to use: Include when you have multiple structures and want the force field to reproduce their relative energetics. Common scenarios:
- Conformer energies — a set of local minima that must be ranked correctly.
- Torsion scans — a series of constrained geometries along a dihedral angle. Each scan point is simply another energy data point; together they define the torsional energy profile the force field must reproduce. Hansen et al. used this approach to implicitly parameterize coupled anomeric effects in sulfamides (Hansen et al., J. Phys. Chem. A 2016, 120, 3677).
Weight guidance: Default weight of 1.0. Energy differences are often on a different scale than geometric observables, so the weight may need adjustment depending on the magnitude of the energy differences relative to geometric residuals.
API:
Vibrational data¶
Vibrational frequencies encode the curvature of the potential energy surface in every direction, making them one of the richest single observables for force field training. Lii and Allinger demonstrated their value in the original MM3 parameterization, fitting 213 experimental frequencies across eight hydrocarbons to an RMS error of 35 cm⁻¹ (Lii & Allinger, J. Am. Chem. Soc. 1989, 111, 8566). In Q2MM, frequency fitting was extended to transition states by comparing MM-computed frequencies against QM reference values (Norrby, J. Mol. Struct. 2000; Hansen et al., Acc. Chem. Res. 2016, 49, 996).
Frequency¶
Definition: Vibrational frequency from normal mode analysis. The MM engine computes its own frequencies from the MM Hessian, and each MM frequency is compared to the corresponding QM frequency.
Units: cm⁻¹
When to use: Frequencies are the primary training observable for transition-state force fields. They encode the curvature of the potential energy surface in every direction and directly reflect the quality of force constants. The Nelder-Mead optimizer is particularly effective for frequency-based objectives (Quinn et al., PLOS ONE 2022).
Weight guidance: Default weight of 1.0. For transition states, imaginary
frequencies (the reaction coordinate mode) are typically excluded via
skip_imaginary=True. Low-frequency modes (< 100 cm⁻¹) are often
poorly described by harmonic force fields and may warrant lower weights.
API:
# Bulk-add from an array (most common)
ts_freqs = np.loadtxt("qm-frequencies.txt")
n = ref.add_frequencies_from_array(ts_freqs, weight=1.0, skip_imaginary=True)
# Or add individually
ref.add_frequency(value=1648.5, data_idx=0, weight=1.0)
Hessian-derived data¶
The Hessian matrix (second derivatives of the energy with respect to nuclear coordinates) contains complete information about the curvature of the potential energy surface at a given geometry. QFUERZA estimation uses Seminario's FUERZA procedure to extract internal force constants directly from the Cartesian Hessian via eigenanalysis of atom-pair interaction submatrices (Seminario, Int. J. Quantum Chem. 1996, 60, 1271), with improved handling of hydrogen angle bends (Farrugia et al., J. Chem. Theory Comput. 2025, 22, 469). Limé and Norrby later extended this to transition states, introducing methods for handling the negative eigenvalue along the reaction coordinate (Limé & Norrby, J. Comput. Chem. 2015, 36, 244). Q2MM supports two ways to use Hessian information as training data, both derived from eigendecomposition of the Hessian.
Why not use raw Hessian elements directly?
The raw Hessian matrix has 3N×3N elements (most redundant due to symmetry), and individual Cartesian second derivatives are frame-dependent and difficult to interpret physically. The eigendecomposition rotates the Hessian into normal-mode space where each element has a clear physical meaning: a diagonal element is the force constant for that mode, and off-diagonal elements measure coupling between modes.
Eigenvalue (diagonal)¶
Definition: Diagonal elements of the Hessian after eigendecomposition — i.e., the eigenvalues. Each eigenvalue is the force constant for one normal mode in Hartree/Bohr².
Kind string: eig_diagonal
When to use: Eigenvalues provide more information than frequencies because they preserve the sign (negative eigenvalues indicate saddle-point directions) and magnitude without the mass-weighting that converts eigenvalues to frequencies. They are particularly useful for transition states where the negative eigenvalue (reaction coordinate) carries information about barrier curvature.
Weight guidance: The default weight scheme separates low-frequency and high-frequency modes:
eig_i = 0.0— the first (imaginary/reaction coordinate) mode gets zero weighteig_d_low = 0.1— eigenvalues below 0.1173 Hartree/Bohr² (≈ 1100 kJ/(mol·Å²))eig_d_high = 0.1— eigenvalues above that threshold
API:
ref.add_hessian_eigenvalue(
value=0.0543, # in Hartree/Bohr²
mode_idx=1, # which eigenvalue (0 = most negative)
weight=0.1,
)
Eigenmatrix (off-diagonal)¶
Definition: Off-diagonal elements of the Hessian projected into the
eigenvector basis: V^T · H · V, where V is the matrix of eigenvectors.
These elements measure coupling between normal modes.
Kind string: eig_offdiagonal
When to use: Off-diagonal eigenmatrix elements capture cross-coupling between modes that frequencies and eigenvalues alone miss. For example, if stretching one bond simultaneously affects the force constant of an adjacent angle, this coupling appears in the off-diagonal elements. Including them provides a tighter constraint on the force field and is recommended for transition states where modes are strongly coupled.
Weight guidance: Default weight of 0.05 (lower than diagonal elements because off-diagonal values are typically smaller in magnitude and less critical to reproduce exactly).
API:
# Add all eigenmatrix data from a Hessian (recommended)
n = ref.add_eigenmatrix_from_hessian(
hessian, # (3N, 3N) array in Hartree/Bohr²
diagonal_only=False, # include off-diagonal elements
skip_first=True, # zero-weight the reaction coordinate
)
# Or add individual off-diagonal elements
ref.add_hessian_offdiagonal(
value=0.00234,
row=1, col=3, # mode indices
weight=0.05,
)
Choosing data types for your problem¶
Transition-state force fields¶
For TSFF parameterization, the standard Q2MM combination is:
- Geometry (bond lengths, bond angles, and optionally torsion angles) — anchor the TS structure. Torsion terms are often zeroed in QFUERZA estimation, so torsion angle data may not be needed.
- Eigenmatrix (diagonal + off-diagonal) — capture per-mode force constants and cross-coupling between modes
This is the default in ReferenceData.from_molecule() and the approach
used in
Rosales et al., Chem. Commun. 2018
and subsequent Q2MM publications.
Why eigenmatrix instead of frequencies?
Frequencies are derived from eigenvalues — they contain the same per-mode force constant information, just mass-weighted. Eigenmatrix data is strictly richer: it also includes off-diagonal elements (mode coupling) that frequencies cannot capture. Using both simultaneously would double-count the diagonal information, so the standard workflow uses eigenmatrix alone.
What about the reaction coordinate?
The reaction coordinate (imaginary frequency mode) is included in
the eigenmatrix as mode 0. By default its weight is zero (eig_i =
0.0), effectively excluding it from the fit. When
invert_ts_curvature=True, the negative eigenvalue is replaced with
a large positive value, making the TS appear as a minimum in the MM
energy surface. The force field does encode the reaction
coordinate — it just isn't trained against the QM value. Setting
eig_i to a non-zero weight would train against the flipped
eigenvalue; see
Limé & Norrby, J. Comput. Chem. 2015, 36, 244
for the tradeoffs.
Ground-state force fields¶
For ground-state (minimum-energy) parameterization, the data combination depends on what you need the force field to reproduce:
- Geometry (bond lengths, bond angles, torsion angles) — equilibrium structure. Always include these as the baseline.
- Eigenmatrix or frequencies — force constant quality. Eigenmatrix
is preferred when a full Hessian is available (e.g., from a Gaussian
.fchkfile); frequencies are sufficient when only normal-mode output is available. Unlike TSFF, there is no imaginary mode to worry about. - Energies — relative conformer energies or torsion scan profiles. See the Energy section above for details and the distinction between conformer ranking and torsion scans.
The Q2MM ferrocene force field (Wahlers et al., J. Org. Chem. 2022, 87, 12334) is an example of a ground-state parameterization that uses geometry + eigenmatrix data.
Ground-state vs transition-state differences
Ground-state fitting is simpler in one key way: all eigenvalues are positive (no reaction coordinate to handle). On the other hand, ground-state fitting often requires multiple conformers, making energy data more important than in single-structure TSFF fits.
Loading reference data¶
From files (recommended)¶
from q2mm.optimizers.objective import ReferenceData
# Standard TSFF: geometry + eigenmatrix (the default)
ref = ReferenceData.from_molecule(mol)
# Explicitly include off-diagonal eigenmatrix elements (also the default)
ref = ReferenceData.from_molecule(
mol,
include_eigenmatrix=True,
eigenmatrix_diagonal_only=False,
)
# Ground-state with frequency data instead of eigenmatrix
ref = ReferenceData.from_molecule(
mol,
frequencies=gs_freqs,
include_eigenmatrix=False,
)
# From a Gaussian formatted checkpoint file
ref, mol = ReferenceData.from_fchk("calculation.fchk")
# From a Gaussian log file
ref, mol = ReferenceData.from_gaussian("calculation.log")
# Multi-conformer fit
ref = ReferenceData.from_molecules(
[mol1, mol2, mol3],
)
Manual construction¶
For fine-grained control, build ReferenceData manually:
ref = ReferenceData()
ref.add_bond_length(value=1.384, atom_indices=(0, 1), weight=10.0)
ref.add_bond_angle(value=104.5, atom_indices=(1, 0, 2), weight=5.0)
ref.add_frequency(value=1648.5, data_idx=0, weight=1.0)
ref.add_eigenmatrix_from_hessian(hessian, diagonal_only=False)
See the API Reference for the full method signatures.