Skip to content

Scoring Functions

scoring

Legacy scoring functions for Q2MM objective function evaluation.

Ported from q2mm.compare — the core scoring logic used by the legacy optimizer loop. The main function is :func:compare_data, which computes:

.. math:: \chi^2 = w^2 (x_r - x_c)^2

where :math:w is a weight, :math:x_r is the reference data point's value, and :math:x_c is the calculated (force field) value.

compare_data

compare_data(r_dict, c_dict, output=None, doprint=False) -> float

Compute the legacy chi-squared objective function score.

Scoring formula per data point
  • Energy types (e, eo, ea, eao): score = w² × diff² / total_num_energy
  • Hessian type (h): score = w² × diff² / N_hessian
  • Other types: score = w² × diff² / N_type

Parameters:

Name Type Description Default
r_dict dict[str, ndarray]

Reference data grouped by type.

required
c_dict dict[str, ndarray]

Calculated data grouped by type.

required
output str | None

Optional file path to write formatted output.

None
doprint bool

If True, print formatted output to stdout.

False

Returns:

Name Type Description
float float

Total objective function score.

Source code in q2mm/optimizers/scoring.py
def compare_data(r_dict, c_dict, output=None, doprint=False) -> float:
    """Compute the legacy chi-squared objective function score.

    Scoring formula per data point:
      - Energy types (e, eo, ea, eao): score = w² × diff² / total_num_energy
      - Hessian type (h): score = w² × diff² / N_hessian
      - Other types: score = w² × diff² / N_type

    Args:
        r_dict (dict[str, np.ndarray]): Reference data grouped by type.
        c_dict (dict[str, np.ndarray]): Calculated data grouped by type.
        output (str | None): Optional file path to write formatted output.
        doprint (bool): If ``True``, print formatted output to stdout.

    Returns:
        float: Total objective function score.
    """
    strings = []
    strings.append(
        "--"
        + " Label ".ljust(30, "-")
        + "--"
        + " Weight ".center(7, "-")
        + "--"
        + " R. Value ".center(11, "-")
        + "--"
        + " C. Value ".center(11, "-")
        + "--"
        + " Score ".center(11, "-")
        + "--"
        + " Row "
        + "--"
    )
    score_typ = defaultdict(float)
    num_typ = defaultdict(int)
    score_tot = 0.0
    total_num = 0
    data_types = sorted(r_dict.keys())
    total_num_energy = 0
    for typ in data_types:
        if typ in ["e", "eo", "ea", "eao"]:
            total_num_energy += len(r_dict[typ])
    for typ in data_types:
        total_num += int(len(r_dict[typ]))
        if typ in ["e", "eo", "ea", "eao"]:
            correlate_energies(r_dict[typ], c_dict[typ])
        import_weights(r_dict[typ])
        for r, c in zip(r_dict[typ], c_dict[typ]):
            if c.typ == "t":
                diff = abs(r.val - c.val)
                if diff > 180.0:
                    diff = 360.0 - diff
            else:
                diff = r.val - c.val
            if typ in ["e", "eo", "ea", "eao"]:
                score = (r.wht**2 * diff**2) / total_num_energy
            elif typ == "h":
                score = (c.wht**2 * diff**2) / len(c_dict[typ])
            else:
                score = (r.wht**2 * diff**2) / len(r_dict[typ])
            score_tot += score
            score_typ[c.typ] += score
            num_typ[c.typ] += 1
            if c.typ == "eig":
                if c.idx_1 == c.idx_2:
                    if r.val < 1100:
                        score_typ[c.typ + "-d-low"] += score
                        num_typ[c.typ + "-d-low"] += 1
                    else:
                        score_typ[c.typ + "-d-high"] += score
                        num_typ[c.typ + "-d-high"] += 1
                else:
                    score_typ[c.typ + "-o"] += score
                    num_typ[c.typ + "-o"] += 1
            if c.ff_row is None:
                strings.append(f"  {c.lbl:<30}  {r.wht:>7.2f}  {r.val:>11.4f}  {c.val:>11.4f}  {score:>11.4f}  ")
            else:
                strings.append(
                    f"  {c.lbl:<30}  {r.wht:>7.2f}  {r.val:>11.4f}  {c.val:>11.4f}  {score:>11.4f}  {c.ff_row:>5} "
                )
    strings.append("-" * 89)
    strings.append("{:<20} {:20.4f}".format("Total score:", score_tot))
    strings.append("{:<30} {:10d}".format("Total Num. data points:", total_num))
    for k, v in num_typ.items():
        strings.append("{:<30} {:10d}".format(k + ":", v))
    strings.append("-" * 89)
    for k, v in score_typ.items():
        strings.append("{:<20} {:20.4f}".format(k + ":", v))
    if output:
        with open(output, "w") as f:
            for line in strings:
                f.write(f"{line}\n")
    if doprint:
        for line in strings:
            print(line)
    return score_tot

correlate_energies

correlate_energies(r_data, c_data)

Align calculated energies to reference by setting the minimum to zero.

Finds the minimum in the reference dataset, then shifts all calculated energies so the corresponding calculated value is zero.

Both datasets must be aligned (same ordering).

Parameters:

Name Type Description Default
r_data ndarray

Reference energy data points.

required
c_data ndarray

Calculated energy data points (modified in place).

required
Source code in q2mm/optimizers/scoring.py
def correlate_energies(r_data, c_data):
    """Align calculated energies to reference by setting the minimum to zero.

    Finds the minimum in the reference dataset, then shifts all calculated
    energies so the corresponding calculated value is zero.

    Both datasets must be aligned (same ordering).

    Args:
        r_data (np.ndarray): Reference energy data points.
        c_data (np.ndarray): Calculated energy data points (modified in place).
    """
    for indices in select_group_of_energies(c_data):
        zero, zero_ind = min((x.val, i) for i, x in enumerate(r_data[indices]))
        zero_ind = indices[zero_ind]
        zero = c_data[zero_ind].val
        for ind in indices:
            c_data[ind].val -= zero

select_group_of_energies

select_group_of_energies(data) -> Iterator[ndarray]

Yield index arrays for each group of energies in the dataset.

Handles all energy types: relative (e, eo) and absolute (ea, eao). Previously only e/eo were iterated, so ea/eao silently passed through correlate_energies uncorrelated — a bug inherited from upstream.

Parameters:

Name Type Description Default
data ndarray

Array of Datum objects with energy types.

required

Yields:

Type Description
ndarray

np.ndarray: Index array for each energy group within the dataset.

Source code in q2mm/optimizers/scoring.py
def select_group_of_energies(data) -> Iterator[np.ndarray]:
    """Yield index arrays for each group of energies in the dataset.

    Handles all energy types: relative (``e``, ``eo``) and absolute
    (``ea``, ``eao``).  Previously only ``e``/``eo`` were iterated,
    so ``ea``/``eao`` silently passed through ``correlate_energies``
    uncorrelated — a bug inherited from upstream.

    Args:
        data (np.ndarray): Array of Datum objects with energy types.

    Yields:
        np.ndarray: Index array for each energy group within the dataset.
    """
    for energy_type in ["e", "eo", "ea", "eao"]:
        indices = np.where([x.typ == energy_type for x in data])[0]
        unique_group_nums = set([x.idx_1 for x in data[indices]])
        for unique_group_num in unique_group_nums:
            more_indices = np.where([x.typ == energy_type and x.idx_1 == unique_group_num for x in data])[0]
            yield more_indices

import_weights

import_weights(data)

Set weights on data points from the default WEIGHTS table.

Only sets weights on data points where datum.wht is None. Eigenvalue data gets special handling based on diagonal/off-diagonal and frequency value.

Parameters:

Name Type Description Default
data ndarray

Array of Datum objects whose weights may be updated in place.

required
Source code in q2mm/optimizers/scoring.py
def import_weights(data):
    """Set weights on data points from the default WEIGHTS table.

    Only sets weights on data points where ``datum.wht is None``.
    Eigenvalue data gets special handling based on diagonal/off-diagonal
    and frequency value.

    Args:
        data (np.ndarray): Array of Datum objects whose weights may be updated
            in place.
    """
    for datum in data:
        if datum.wht is None:
            if datum.typ == "eig":
                if datum.idx_1 == datum.idx_2 == 1:
                    datum.wht = WEIGHTS["eig_i"]
                elif datum.idx_1 == datum.idx_2:
                    if datum.val < 1100:
                        datum.wht = WEIGHTS["eig_d_low"]
                    else:
                        datum.wht = WEIGHTS["eig_d_high"]
                elif datum.idx_1 != datum.idx_2:
                    datum.wht = WEIGHTS["eig_o"]
            else:
                datum.wht = WEIGHTS[datum.typ]