Skip to content

Evaluation Submodule

SRToolkit.evaluation

Classes and functions for evaluating symbolic regression approaches.

Modules:

Name Description
parameter_estimator

ParameterEstimator — fits free constants in expressions and ranks them by RMSE.

sr_evaluator

SR_evaluator and SR_results — expression evaluation and result management.

result_augmentation

ResultAugmenter implementations that post-process results with LaTeX, simplified forms, RMSE, BED, and R² scores.

callbacks

SRCallbacks and CallbackDispatcher — event-driven hooks for monitoring and early stopping during evaluation.

BestExpressionFound dataclass

BestExpressionFound(experiment_id: str, expression: str, error: float, evaluation_number: int)

Fired when a new best expression is found during evaluation.

Attributes:

Name Type Description
experiment_id str

Identifier of the current experiment.

expression str

String representation of the new best expression.

error float

Error value of the new best expression.

evaluation_number int

Total number of evaluate_expr calls made at the time this event is fired.

CallbackDispatcher

CallbackDispatcher(callbacks: Optional[List[SRCallbacks]] = None)

Manages multiple SRCallbacks instances and dispatches events to all of them.

Examples:

>>> dispatcher = CallbackDispatcher()
>>> dispatcher.add(EarlyStoppingCallback(threshold=1e-6))
>>> len(dispatcher._callbacks)
1

Parameters:

Name Type Description Default
callbacks Optional[List[SRCallbacks]]

Initial list of callbacks. Defaults to an empty list.

None
Source code in SRToolkit/evaluation/callbacks.py
def __init__(self, callbacks: Optional[List[SRCallbacks]] = None):
    """
    Args:
        callbacks: Initial list of callbacks. Defaults to an empty list.
    """
    if callbacks is None:
        self._callbacks: List[SRCallbacks] = []
    else:
        self._callbacks = callbacks

get_callbacks

get_callbacks() -> List[SRCallbacks]

Returns the list of callbacks.

Returns:

Type Description
List[SRCallbacks]

A list of SRCallbacks instances in this dispatcher.

Source code in SRToolkit/evaluation/callbacks.py
def get_callbacks(self) -> List[SRCallbacks]:
    """
    Returns the list of callbacks.

    Returns:
        A list of [SRCallbacks][SRToolkit.evaluation.callbacks.SRCallbacks] instances in this dispatcher.
    """
    return self._callbacks

add

add(callback: SRCallbacks) -> None

Add a callback to the dispatcher.

Parameters:

Name Type Description Default
callback SRCallbacks

The SRCallbacks instance to add.

required
Source code in SRToolkit/evaluation/callbacks.py
def add(self, callback: SRCallbacks) -> None:
    """
    Add a callback to the dispatcher.

    Args:
        callback: The [SRCallbacks][SRToolkit.evaluation.callbacks.SRCallbacks] instance to add.
    """
    self._callbacks.append(callback)

remove

remove(callback: SRCallbacks) -> None

Remove a callback from the dispatcher.

Parameters:

Name Type Description Default
callback SRCallbacks

The SRCallbacks instance to remove.

required

Raises:

Type Description
ValueError

If callback is not currently registered.

Source code in SRToolkit/evaluation/callbacks.py
def remove(self, callback: SRCallbacks) -> None:
    """
    Remove a callback from the dispatcher.

    Args:
        callback: The [SRCallbacks][SRToolkit.evaluation.callbacks.SRCallbacks] instance to remove.

    Raises:
        ValueError: If ``callback`` is not currently registered.
    """
    self._callbacks.remove(callback)

on_expr_evaluated

on_expr_evaluated(event: ExprEvaluated) -> bool

Dispatch to all callbacks and aggregate the stop signal.

Parameters:

Name Type Description Default
event ExprEvaluated

Data about the evaluated expression.

required

Returns:

Type Description
bool

False if any callback returned False (requesting early stop), True otherwise.

Source code in SRToolkit/evaluation/callbacks.py
def on_expr_evaluated(self, event: ExprEvaluated) -> bool:
    """
    Dispatch to all callbacks and aggregate the stop signal.

    Args:
        event: Data about the evaluated expression.

    Returns:
        ``False`` if any callback returned ``False`` (requesting early stop), ``True`` otherwise.
    """
    should_continue = True
    for cb in self._callbacks:
        cont = cb.on_expr_evaluated(event)
        if isinstance(cont, bool) and not cont:
            should_continue = False
    return should_continue

on_best_expression

on_best_expression(event: BestExpressionFound) -> bool

Dispatch to all callbacks and aggregate the stop signal.

Parameters:

Name Type Description Default
event BestExpressionFound

Data about the new best expression.

required

Returns:

Type Description
bool

False if any callback returned False (requesting early stop), True otherwise.

Source code in SRToolkit/evaluation/callbacks.py
def on_best_expression(self, event: BestExpressionFound) -> bool:
    """
    Dispatch to all callbacks and aggregate the stop signal.

    Args:
        event: Data about the new best expression.

    Returns:
        ``False`` if any callback returned ``False`` (requesting early stop), ``True`` otherwise.
    """
    should_continue = True
    for cb in self._callbacks:
        cont = cb.on_best_expression(event)
        if isinstance(cont, bool) and not cont:
            should_continue = False
    return should_continue

on_experiment_start

on_experiment_start(event: ExperimentEvent) -> None

Dispatch to all callbacks.

Parameters:

Name Type Description Default
event ExperimentEvent

Data about the experiment that is about to begin.

required
Source code in SRToolkit/evaluation/callbacks.py
def on_experiment_start(self, event: ExperimentEvent) -> None:
    """
    Dispatch to all callbacks.

    Args:
        event: Data about the experiment that is about to begin.
    """
    for cb in self._callbacks:
        cb.on_experiment_start(event)

on_experiment_end

on_experiment_end(event: ExperimentEvent, results: EvalResult) -> None

Dispatch to all callbacks.

Parameters:

Name Type Description Default
event ExperimentEvent

Data about the experiment that just ended.

required
results EvalResult

Final EvalResult for this experiment.

required
Source code in SRToolkit/evaluation/callbacks.py
def on_experiment_end(self, event: ExperimentEvent, results: EvalResult) -> None:
    """
    Dispatch to all callbacks.

    Args:
        event: Data about the experiment that just ended.
        results: Final [EvalResult][SRToolkit.utils.types.EvalResult] for this experiment.
    """
    for cb in self._callbacks:
        cb.on_experiment_end(event, results)

EarlyStoppingCallback

EarlyStoppingCallback(threshold: Optional[float], max_evaluations: Optional[int] = None)

Bases: SRCallbacks

Stops the search when the best expression error falls below a threshold.

Examples:

>>> cb = EarlyStoppingCallback(threshold=1e-6)
>>> cb.on_best_expression(BestExpressionFound("", "X_0", 1e-7, 42))
False
>>> cb.on_best_expression(BestExpressionFound("", "X_0", 1e-5, 43))
True

Parameters:

Name Type Description Default
threshold Optional[float]

Error value below which the search is stopped.

required
Source code in SRToolkit/evaluation/callbacks.py
def __init__(self, threshold: Optional[float], max_evaluations: Optional[int] = None):
    """
    Args:
        threshold: Error value below which the search is stopped.
    """
    self.threshold = threshold
    self.max_evaluations = max_evaluations

ExperimentEvent dataclass

ExperimentEvent(dataset_name: str, approach_name: str, max_evaluations: Optional[int], success_threshold: Optional[float], seed: Optional[int])

Fired at experiment start and end.

Attributes:

Name Type Description
dataset_name str

Name of the dataset being evaluated.

approach_name str

Name of the SR approach being run.

max_evaluations Optional[int]

Maximum number of evaluations allowed for this experiment.

success_threshold Optional[float]

Error threshold for success, or None if not set.

seed Optional[int]

Random seed used for this experiment, or None if not set.

ExprEvaluated dataclass

ExprEvaluated(expression: str, error: float, evaluation_number: int, experiment_id: str, is_new_best: bool)

Fired after each expression is evaluated by evaluate_expr.

Attributes:

Name Type Description
expression str

String representation of the evaluated expression.

error float

Error value returned by the ranking function (RMSE or BED).

evaluation_number int

Total number of evaluate_expr calls made so far, including cache hits.

experiment_id str

Identifier of the current experiment.

is_new_best bool

True if this expression achieved a lower error than all previous ones.

LoggingCallback

LoggingCallback(log_file: Optional[str] = None)

Bases: SRCallbacks

Logs each new best expression to stdout or a file.

log_file may contain placeholders that are resolved at experiment start using fields from ExperimentEvent. Available placeholders: {dataset_name},{approach_name}, {seed}. Using per-experiment placeholders (e.g. {seed}) gives each job its own file, which is the recommended approach for parallel execution.

When multiple jobs share the same resolved file path, writes are protected by fcntl.flock (POSIX advisory locking) so concurrent processes on Linux / macOS do not corrupt each other's output. On Windows or network filesystems where flock is unavailable the lock is silently skipped.

Examples:

>>> cb = LoggingCallback()
>>> cb.on_best_expression(BestExpressionFound("Nguyen-1_ProGED_42", "X_0+C", 0.001, 10))
[Experiment Nguyen-1_ProGED_42] New best: X_0+C (error=1.000000e-03)
>>> cb = LoggingCallback(log_file="logs/{dataset_name}_{seed}.log")
>>> cb.on_experiment_start(ExperimentEvent(dataset_name="test", max_evaluations=10, seed=1,
...                                        success_threshold=0, approach_name="ta"))
>>> cb._resolved_log_file
'logs/test_1.log'

Parameters:

Name Type Description Default
log_file Optional[str]

Destination for log messages. If None, messages are printed to stdout. May be a plain path or a template string with placeholders {dataset_name}, {approach_name}, {seed} that are resolved when the experiment starts.

None
Source code in SRToolkit/evaluation/callbacks.py
def __init__(self, log_file: Optional[str] = None):
    """
    Args:
        log_file: Destination for log messages.  If ``None``, messages are
            printed to stdout.  May be a plain path or a template string with
            placeholders ``{dataset_name}``, ``{approach_name}``, ``{seed}``
            that are resolved when the experiment starts.
    """
    self.log_file = log_file
    self._resolved_log_file: Optional[str] = log_file

ProgressBarCallback

ProgressBarCallback(desc: Optional[str] = None)

Bases: SRCallbacks

Displays a tqdm progress bar that updates after each expression evaluation.

Examples:

>>> cb = ProgressBarCallback(desc="My search")
>>> cb.desc
'My search'

Parameters:

Name Type Description Default
desc Optional[str]

Description label shown on the progress bar. If None, the label is auto-generated as "<approach> on <dataset>" when the experiment starts.

None
Source code in SRToolkit/evaluation/callbacks.py
def __init__(self, desc: Optional[str] = None):
    """
    Args:
        desc: Description label shown on the progress bar. If ``None``, the label
            is auto-generated as ``"<approach> on <dataset>"`` when the experiment starts.
    """
    self.pbar = None
    self.desc = desc

SRCallbacks

Bases: ABC

Abstract base class for SR evaluation callbacks.

Implement only the methods you need. Return False from on_expr_evaluated or on_best_expression to request early stopping; return True or None to continue.

Examples:

>>> class PrintBestCallback(SRCallbacks):
...     def on_best_expression(self, event):
...         print(f"New best: {event.expression} (error={event.error:.4g})")
>>> cb = PrintBestCallback()
>>> cb.on_best_expression(BestExpressionFound("", "X_0+C", 0.01, 5))
New best: X_0+C (error=0.01)

on_expr_evaluated

on_expr_evaluated(event: ExprEvaluated) -> Optional[bool]

Called after each expression is evaluated.

Parameters:

Name Type Description Default
event ExprEvaluated

Data about the evaluated expression.

required

Returns:

Type Description
Optional[bool]

False to stop the search early, True or None to continue.

Source code in SRToolkit/evaluation/callbacks.py
def on_expr_evaluated(self, event: ExprEvaluated) -> Optional[bool]:
    """
    Called after each expression is evaluated.

    Args:
        event: Data about the evaluated expression.

    Returns:
        ``False`` to stop the search early, ``True`` or ``None`` to continue.
    """
    return None

on_best_expression

on_best_expression(event: BestExpressionFound) -> Optional[bool]

Called when a new best expression is found.

Parameters:

Name Type Description Default
event BestExpressionFound

Data about the new best expression.

required

Returns:

Type Description
Optional[bool]

False to stop the search early, True or None to continue.

Source code in SRToolkit/evaluation/callbacks.py
def on_best_expression(self, event: BestExpressionFound) -> Optional[bool]:
    """
    Called when a new best expression is found.

    Args:
        event: Data about the new best expression.

    Returns:
        ``False`` to stop the search early, ``True`` or ``None`` to continue.
    """
    return None

on_experiment_start

on_experiment_start(event: ExperimentEvent) -> None

Called before an experiment starts.

Parameters:

Name Type Description Default
event ExperimentEvent

Data about the experiment that is about to begin.

required
Source code in SRToolkit/evaluation/callbacks.py
def on_experiment_start(self, event: ExperimentEvent) -> None:
    """
    Called before an experiment starts.

    Args:
        event: Data about the experiment that is about to begin.
    """
    pass

on_experiment_end

on_experiment_end(event: ExperimentEvent, results: EvalResult) -> None

Called after an experiment completes.

Parameters:

Name Type Description Default
event ExperimentEvent

Data about the experiment that just ended.

required
results EvalResult

Final EvalResult for this experiment.

required
Source code in SRToolkit/evaluation/callbacks.py
def on_experiment_end(self, event: ExperimentEvent, results: EvalResult) -> None:
    """
    Called after an experiment completes.

    Args:
        event: Data about the experiment that just ended.
        results: Final [EvalResult][SRToolkit.utils.types.EvalResult] for this experiment.
    """
    pass

to_dict

to_dict() -> dict

Serialise this callback to a JSON-safe dictionary.

The default implementation stores only the fully-qualified class path. Override in subclasses to include constructor parameters so that from_dict can reconstruct a functionally identical instance.

Returns:

Type Description
dict

A JSON-safe dict with at least a "callback_class" key.

Source code in SRToolkit/evaluation/callbacks.py
def to_dict(self) -> dict:
    """
    Serialise this callback to a JSON-safe dictionary.

    The default implementation stores only the fully-qualified class path.
    Override in subclasses to include constructor parameters so that
    [from_dict][SRToolkit.evaluation.callbacks.SRCallbacks.from_dict] can
    reconstruct a functionally identical instance.

    Returns:
        A JSON-safe dict with at least a ``"callback_class"`` key.
    """
    return {"callback_class": f"{self.__class__.__module__}.{self.__class__.__qualname__}"}

from_dict classmethod

from_dict(d: dict) -> SRCallbacks

Reconstruct a callback from a serialised dictionary.

The default implementation calls cls() with no arguments. Override in subclasses that require constructor parameters.

Parameters:

Name Type Description Default
d dict

Dictionary produced by to_dict.

required

Returns:

Type Description
SRCallbacks

A new instance of this callback class.

Source code in SRToolkit/evaluation/callbacks.py
@classmethod
def from_dict(cls, d: dict) -> "SRCallbacks":
    """
    Reconstruct a callback from a serialised dictionary.

    The default implementation calls ``cls()`` with no arguments. Override in
    subclasses that require constructor parameters.

    Args:
        d: Dictionary produced by
            [to_dict][SRToolkit.evaluation.callbacks.SRCallbacks.to_dict].

    Returns:
        A new instance of this callback class.
    """
    return cls()

ParameterEstimator

ParameterEstimator(X: ndarray, y: ndarray, symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(), seed: Optional[int] = None, **kwargs: Unpack[EstimationSettings])

Fits free constants in symbolic expressions by minimizing RMSE against target values.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
>>> y = np.array([3, 0, 3, 11])
>>> pe = ParameterEstimator(X, y)
>>> rmse, constants = pe.estimate_parameters(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
>>> print(1.99 < constants[0] < 2.01)
True

Parameters:

Name Type Description Default
X ndarray

Input data of shape (n_samples, n_features) used to evaluate expressions.

required
y ndarray

Target values of shape (n_samples,).

required
symbol_library SymbolLibrary

Symbol library defining the token vocabulary. Defaults to SymbolLibrary.default_symbols.

default_symbols()
seed Optional[int]

Random seed for reproducible constant initialization. Default None.

None
**kwargs Unpack[EstimationSettings]

Optional estimation settings from EstimationSettings. Supported keys: method, tol, gtol, max_iter, constant_bounds, initialization, max_constants, max_expr_length.

{}

Attributes:

Name Type Description
symbol_library

The symbol library used.

X

Input data.

y

Target values.

seed

Random seed.

estimation_settings

Active settings dict, merged from defaults and **kwargs.

Source code in SRToolkit/evaluation/parameter_estimator.py
def __init__(
    self,
    X: np.ndarray,
    y: np.ndarray,
    symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
    seed: Optional[int] = None,
    **kwargs: Unpack[EstimationSettings],
) -> None:
    """
    Fits free constants in symbolic expressions by minimizing RMSE against target values.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
        >>> y = np.array([3, 0, 3, 11])
        >>> pe = ParameterEstimator(X, y)
        >>> rmse, constants = pe.estimate_parameters(["C", "*", "X_1", "-", "X_0"])
        >>> print(rmse < 1e-6)
        True
        >>> print(1.99 < constants[0] < 2.01)
        True

    Args:
        X: Input data of shape ``(n_samples, n_features)`` used to evaluate expressions.
        y: Target values of shape ``(n_samples,)``.
        symbol_library: Symbol library defining the token vocabulary.
            Defaults to [SymbolLibrary.default_symbols][SRToolkit.utils.symbol_library.SymbolLibrary.default_symbols].
        seed: Random seed for reproducible constant initialization. Default ``None``.
        **kwargs: Optional estimation settings from
            [EstimationSettings][SRToolkit.utils.types.EstimationSettings].
            Supported keys: ``method``, ``tol``, ``gtol``, ``max_iter``,
            ``constant_bounds``, ``initialization``, ``max_constants``,
            ``max_expr_length``.

    Attributes:
        symbol_library: The symbol library used.
        X: Input data.
        y: Target values.
        seed: Random seed.
        estimation_settings: Active settings dict, merged from defaults and ``**kwargs``.
    """
    self.symbol_library = symbol_library
    self.X = X
    self.y = y
    self.seed = seed

    self.estimation_settings = {
        "method": "L-BFGS-B",
        "tol": 1e-6,
        "gtol": 1e-3,
        "max_iter": 100,
        "constant_bounds": (-5, 5),
        "initialization": "random",  # random, mean
        "max_constants": 8,
        "max_expr_length": -1,
    }

    if kwargs:
        for k in self.estimation_settings.keys():
            if k in kwargs:
                self.estimation_settings[k] = kwargs[k]  # type: ignore[literal-required]

    self._rng = np.random.default_rng(self.seed)

estimate_parameters

estimate_parameters(expr: Union[List[str], Node]) -> Tuple[float, np.ndarray]

Fit free constants in expr by minimizing RMSE against the target values.

Expressions that exceed max_constants or max_expr_length immediately return (NaN, []). Expressions with no free constants are evaluated directly without running the optimizer.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
>>> y = np.array([3, 0, 3, 11])
>>> pe = ParameterEstimator(X, y)
>>> rmse, constants = pe.estimate_parameters(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
>>> print(1.99 < constants[0] < 2.01)
True
>>> # Constant-free expressions are evaluated directly
>>> rmse, constants = pe.estimate_parameters(["X_1", "-", "X_0"])
>>> constants.size
0

Parameters:

Name Type Description Default
expr Union[List[str], Node]

Expression as a token list in infix notation or a Node tree.

required

Returns:

Type Description
Tuple[float, ndarray]

A 2-tuple (rmse, parameters) where rmse is the root-mean-square error of the fitted expression and parameters is a 1-D array of optimized constant values. Returns (NaN, []) if the expression violates max_constants or max_expr_length.

Source code in SRToolkit/evaluation/parameter_estimator.py
def estimate_parameters(self, expr: Union[List[str], Node]) -> Tuple[float, np.ndarray]:
    """
    Fit free constants in *expr* by minimizing RMSE against the target values.

    Expressions that exceed ``max_constants`` or ``max_expr_length`` immediately
    return ``(NaN, [])``. Expressions with no free constants are evaluated directly
    without running the optimizer.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
        >>> y = np.array([3, 0, 3, 11])
        >>> pe = ParameterEstimator(X, y)
        >>> rmse, constants = pe.estimate_parameters(["C", "*", "X_1", "-", "X_0"])
        >>> print(rmse < 1e-6)
        True
        >>> print(1.99 < constants[0] < 2.01)
        True
        >>> # Constant-free expressions are evaluated directly
        >>> rmse, constants = pe.estimate_parameters(["X_1", "-", "X_0"])
        >>> constants.size
        0

    Args:
        expr: Expression as a token list in infix notation or a
            [Node][SRToolkit.utils.expression_tree.Node] tree.

    Returns:
        A 2-tuple ``(rmse, parameters)`` where ``rmse`` is the root-mean-square error of the fitted expression and ``parameters`` is a 1-D array of optimized constant values. Returns ``(NaN, [])`` if the expression violates ``max_constants`` or ``max_expr_length``.
    """
    if isinstance(expr, Node):
        expr_str = expr.to_list(notation="prefix")
        num_constants = sum([1 for t in expr_str if self.symbol_library.get_type(t) == "const"])
    else:
        num_constants = sum([1 for t in expr if self.symbol_library.get_type(t) == "const"])
    if (
        isinstance(self.estimation_settings["max_constants"], int)
        and 0 <= self.estimation_settings["max_constants"] < num_constants
    ):
        return np.nan, np.array([])

    if isinstance(self.estimation_settings["max_expr_length"], int) and 0 <= self.estimation_settings[
        "max_expr_length"
    ] < len(expr):
        return np.nan, np.array([])

    executable_error_fn = expr_to_error_function(expr, self.symbol_library)

    if num_constants == 0:
        rmse = executable_error_fn(self.X, np.array([]), self.y)
        return rmse, np.array([])
    else:
        return self._optimize_parameters(executable_error_fn, num_constants)

BED

BED(evaluator: SR_evaluator, scope: str = 'top', name: str = 'BED')

Bases: ResultAugmenter

Computes BED for the top models using a separate evaluator (e.g. a held-out test set).

Parameters:

Name Type Description Default
evaluator SR_evaluator

SR_evaluator used to score the models. Must be initialized with ranking_function="bed".

required
scope str

Which expressions to score.

  • "best": only the best expression.
  • "top": the best expression and all top-k models.
  • "all": everything in "top" plus all evaluated expressions.
'top'
name str

Key used in augmentations dict of EvalResult and ModelResult. Default "BED".

'BED'

Raises:

Type Description
Exception

If evaluator.ranking_function != "bed".

Source code in SRToolkit/evaluation/result_augmentation.py
def __init__(self, evaluator: SR_evaluator, scope: str = "top", name: str = "BED") -> None:  # noqa: F821
    """
    Computes BED for the top models using a separate evaluator (e.g. a held-out test set).

    Args:
        evaluator: [SR_evaluator][SRToolkit.evaluation.sr_evaluator.SR_evaluator] used to
            score the models. Must be initialized with ``ranking_function="bed"``.
        scope: Which expressions to score.

            - ``"best"``: only the best expression.
            - ``"top"``: the best expression and all top-k models.
            - ``"all"``: everything in ``"top"`` plus all evaluated expressions.
        name: Key used in
            ``augmentations`` dict of [EvalResult][SRToolkit.utils.types.EvalResult] and
            [ModelResult][SRToolkit.utils.types.ModelResult].
            Default ``"BED"``.

    Raises:
        Exception: If ``evaluator.ranking_function != "bed"``.
    """
    super().__init__(name)
    self.evaluator = evaluator

    if scope not in ["best", "top", "all"]:
        raise Exception(f"[BED augmenter] Invalid scope: {scope}. Must be one of 'best', 'top', 'all'.")
    self.scope = scope

    if self.evaluator.ranking_function != "bed":
        raise Exception("[BED augmenter] Ranking function of the evaluator must be set to 'bed' to compute BED.")

write_results

write_results(results: EvalResult) -> None

Write BED scores into results and its models.

Stores {"best_expr_bed": ...} in EvalResult augmentations and {"bed": ...} in each model's augmentations when scope is "top" or "all".

Parameters:

Name Type Description Default
results EvalResult

The EvalResult to augment.

required
Source code in SRToolkit/evaluation/result_augmentation.py
def write_results(
    self,
    results: EvalResult,
) -> None:
    """
    Write BED scores into *results* and its models.

    Stores ``{"best_expr_bed": ...}`` in
    [EvalResult][SRToolkit.utils.types.EvalResult] ``augmentations`` and
    ``{"bed": ...}`` in each model's augmentations when ``scope`` is ``"top"`` or ``"all"``.

    Args:
        results: The [EvalResult][SRToolkit.utils.types.EvalResult] to augment.
    """
    eval_data: Dict[str, Any] = {"best_expr_bed": self.evaluator.evaluate_expr(results.top_models[0].expr)}
    results.add_augmentation(self.name, eval_data, self._type)

    if self.scope == "top" or self.scope == "all":
        for model in results.top_models:
            top_model_data: Dict[str, Any] = {"bed": self.evaluator.evaluate_expr(model.expr)}
            model.add_augmentation(self.name, top_model_data, self._type)

    if self.scope == "all":
        for model in results.all_models:
            all_model_data: Dict[str, Any] = {"bed": self.evaluator.evaluate_expr(model.expr)}
            model.add_augmentation(self.name, all_model_data, self._type)

format_eval_result classmethod

format_eval_result(data: Dict[str, Any]) -> str

Format experiment-level BED data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "best_expr_bed".

required

Returns:

Type Description
str

A human-readable string, or empty string if no data is present.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_eval_result(cls, data: Dict[str, Any]) -> str:
    """
    Format experiment-level BED data for display.

    Args:
        data: Augmentation dict containing ``"best_expr_bed"``.

    Returns:
        A human-readable string, or empty string if no data is present.
    """
    val = data.get("best_expr_bed", "")
    return f"Test BED: {val}" if val != "" else ""

format_model_result classmethod

format_model_result(data: Dict[str, Any]) -> str

Format per-model BED data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "bed".

required

Returns:

Type Description
str

A human-readable string, or empty string if no data is present.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_model_result(cls, data: Dict[str, Any]) -> str:
    """
    Format per-model BED data for display.

    Args:
        data: Augmentation dict containing ``"bed"``.

    Returns:
        A human-readable string, or empty string if no data is present.
    """
    val = data.get("bed", "")
    return f"BED={val}" if val != "" else ""

to_dict

to_dict(base_path: str, name: str) -> dict

Creates a dictionary representation of the BED augmenter.

Parameters:

Name Type Description Default
base_path str

Used to save the data of the evaluator to disk.

required
name str

Used to save the data of the evaluator to disk.

required

Returns:

Type Description
dict

A dictionary containing the necessary information to recreate the augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
def to_dict(self, base_path: str, name: str) -> dict:
    """
    Creates a dictionary representation of the BED augmenter.

    Args:
        base_path: Used to save the data of the evaluator to disk.
        name: Used to save the data of the evaluator to disk.

    Returns:
        A dictionary containing the necessary information to recreate the augmenter.
    """
    return {
        "format_version": 1,
        "name": self.name,
        "type": "BED",
        "scope": self.scope,
        "evaluator": self.evaluator.to_dict(base_path, name + "_BED_augmenter"),
    }

from_dict staticmethod

from_dict(data: dict) -> BED

Creates an instance of the BED augmenter from a dictionary.

Parameters:

Name Type Description Default
data dict

A dictionary containing the necessary information to recreate the augmenter.

required

Returns:

Type Description
BED

An instance of the BED augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
@staticmethod
def from_dict(data: dict) -> "BED":
    """
    Creates an instance of the BED augmenter from a dictionary.

    Args:
        data: A dictionary containing the necessary information to recreate the augmenter.

    Returns:
        An instance of the BED augmenter.
    """
    if data.get("format_version", 1) != 1:
        raise ValueError(f"[BED.from_dict] Unsupported format_version: {data.get('format_version')!r}. Expected 1.")
    evaluator = SR_evaluator.from_dict(data["evaluator"])
    return BED(evaluator, scope=data["scope"], name=data["name"])

R2

R2(evaluator: SR_evaluator, scope: str = 'top', name: str = 'R2')

Bases: ResultAugmenter

Computes R² for the top models using a separate evaluator (e.g. a held-out test set).

The same evaluator instance can be shared with RMSE to avoid loading test data twice.

Parameters:

Name Type Description Default
evaluator SR_evaluator

SR_evaluator used to score the models. Must be initialized with ranking_function="rmse" and a non-None y.

required
scope str

Which expressions to score.

  • "best": only the best expression.
  • "top": the best expression and all top-k models.
  • "all": everything in "top" plus all evaluated expressions.
'top'
name str

Key used in augmentations dict of EvalResult and ModelResult. Default "R2".

'R2'

Raises:

Type Description
Exception

If evaluator.ranking_function != "rmse" or evaluator.y is None.

Source code in SRToolkit/evaluation/result_augmentation.py
def __init__(self, evaluator: SR_evaluator, scope: str = "top", name: str = "R2") -> None:  # noqa: F821
    """
    Computes R² for the top models using a separate evaluator (e.g. a held-out test set).

    The same evaluator instance can be shared with
    [RMSE][SRToolkit.evaluation.result_augmentation.RMSE] to avoid loading test data twice.

    Args:
        evaluator: [SR_evaluator][SRToolkit.evaluation.sr_evaluator.SR_evaluator] used to
            score the models. Must be initialized with ``ranking_function="rmse"`` and a
            non-``None`` ``y``.
        scope: Which expressions to score.

            - ``"best"``: only the best expression.
            - ``"top"``: the best expression and all top-k models.
            - ``"all"``: everything in ``"top"`` plus all evaluated expressions.
        name: Key used in
            ``augmentations`` dict of [EvalResult][SRToolkit.utils.types.EvalResult] and
            [ModelResult][SRToolkit.utils.types.ModelResult].
            Default ``"R2"``.

    Raises:
        Exception: If ``evaluator.ranking_function != "rmse"`` or ``evaluator.y is None``.
    """
    super().__init__(name)

    if scope not in ["best", "top", "all"]:
        raise Exception(f"[R2 augmenter] Invalid scope: {scope}. Must be one of 'best', 'top', 'all'.")
    self.scope = scope

    self.evaluator = evaluator
    if self.evaluator.ranking_function != "rmse":
        raise Exception("[R2 augmenter] Ranking function of the evaluator must be set to 'rmse' to compute R^2.")
    if self.evaluator.y is None:
        raise Exception("[R2 augmenter] y in the evaluator must not be None to compute R^2.")
    self.ss_tot = np.sum((self.evaluator.y - np.mean(self.evaluator.y)) ** 2)

write_results

write_results(results: EvalResult) -> None

Write R² scores into results and its models.

Stores {"best_expr_r^2": ...} in EvalResult augmentations and {"r^2": ..., "parameters_r^2": ...} in each model's augmentations when scope is "top" or "all".

Parameters:

Name Type Description Default
results EvalResult

The EvalResult to augment.

required
Source code in SRToolkit/evaluation/result_augmentation.py
def write_results(self, results: EvalResult) -> None:
    """
    Write R² scores into *results* and its models.

    Stores ``{"best_expr_r^2": ...}`` in
    [EvalResult][SRToolkit.utils.types.EvalResult] ``augmentations`` and
    ``{"r^2": ..., "parameters_r^2": ...}`` in each model's augmentations when ``scope``
    is ``"top"`` or ``"all"``.

    Args:
        results: The [EvalResult][SRToolkit.utils.types.EvalResult] to augment.
    """
    eval_data: Dict[str, Any] = {"best_expr_r^2": self._compute_r2(results.top_models[0])}
    results.add_augmentation(self.name, eval_data, self._type)

    if self.scope == "top" or self.scope == "all":
        for model in results.top_models:
            key = "".join(model.expr)
            top_model_data: Dict[str, Any] = {
                "r^2": self._compute_r2(model),
                "parameters_r^2": self.evaluator.models[key].parameters,
            }
            model.add_augmentation(self.name, top_model_data, self._type)

    if self.scope == "all":
        for model in results.all_models:
            key = "".join(model.expr)
            all_model_data: Dict[str, Any] = {
                "r^2": self._compute_r2(model),
                "parameters_r^2": self.evaluator.models[key].parameters,
            }
            model.add_augmentation(self.name, all_model_data, self._type)

format_eval_result classmethod

format_eval_result(data: Dict[str, Any]) -> str

Format experiment-level R² data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "best_expr_r^2".

required

Returns:

Type Description
str

A human-readable string, or empty string if no data is present.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_eval_result(cls, data: Dict[str, Any]) -> str:
    """
    Format experiment-level R² data for display.

    Args:
        data: Augmentation dict containing ``"best_expr_r^2"``.

    Returns:
        A human-readable string, or empty string if no data is present.
    """
    val = data.get("best_expr_r^2", "")
    return f"Test R²: {val}" if val != "" else ""

format_model_result classmethod

format_model_result(data: Dict[str, Any]) -> str

Format per-model R² data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "r^2" and optionally "parameters_r^2".

required

Returns:

Type Description
str

A human-readable string with R² and fitted parameters.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_model_result(cls, data: Dict[str, Any]) -> str:
    """
    Format per-model R² data for display.

    Args:
        data: Augmentation dict containing ``"r^2"`` and optionally ``"parameters_r^2"``.

    Returns:
        A human-readable string with R² and fitted parameters.
    """
    parts = [f"R²={data['r^2']:.4g}"]
    if "parameters_r^2" in data and data["parameters_r^2"] is not None:
        parts.append(f"params={np.round(data['parameters_r^2'], 4).tolist()}")
    return ", ".join(parts)

to_dict

to_dict(base_path: str, name: str) -> dict

Creates a dictionary representation of the R2 augmenter.

Parameters:

Name Type Description Default
base_path str

Used to save the data of the evaluator to disk.

required
name str

Used to save the data of the evaluator to disk.

required

Returns:

Type Description
dict

A dictionary containing the necessary information to recreate the augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
def to_dict(self, base_path: str, name: str) -> dict:
    """
    Creates a dictionary representation of the R2 augmenter.

    Args:
        base_path: Used to save the data of the evaluator to disk.
        name: Used to save the data of the evaluator to disk.

    Returns:
        A dictionary containing the necessary information to recreate the augmenter.
    """
    return {
        "format_version": 1,
        "name": self.name,
        "type": "R2",
        "scope": self.scope,
        "evaluator": self.evaluator.to_dict(base_path, name + "_R2_augmenter"),
    }

from_dict staticmethod

from_dict(data: dict) -> R2

Creates an instance of the R2 augmenter from a dictionary.

Parameters:

Name Type Description Default
data dict

A dictionary containing the necessary information to recreate the augmenter.

required

Returns:

Type Description
R2

An instance of the R2 augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
@staticmethod
def from_dict(data: dict) -> "R2":
    """
    Creates an instance of the R2 augmenter from a dictionary.

    Args:
        data: A dictionary containing the necessary information to recreate the augmenter.

    Returns:
        An instance of the R2 augmenter.
    """
    if data.get("format_version", 1) != 1:
        raise ValueError(f"[R2.from_dict] Unsupported format_version: {data.get('format_version')!r}. Expected 1.")
    evaluator = SR_evaluator.from_dict(data["evaluator"])
    return R2(evaluator, scope=data["scope"], name=data["name"])

RMSE

RMSE(evaluator: SR_evaluator, scope: str = 'top', name: str = 'RMSE')

Bases: ResultAugmenter

Computes RMSE for the top models using a separate evaluator (e.g. a held-out test set).

Parameters:

Name Type Description Default
evaluator SR_evaluator

SR_evaluator used to score the models. Must be initialized with ranking_function="rmse" and a non-None y.

required
scope str

Which expressions to score.

  • "best": only the best expression.
  • "top": the best expression and all top-k models.
  • "all": everything in "top" plus all evaluated expressions.
'top'
name str

Key used in augmentations dict of EvalResult and ModelResult. Default "RMSE".

'RMSE'

Raises:

Type Description
Exception

If evaluator.ranking_function != "rmse" or evaluator.y is None.

Source code in SRToolkit/evaluation/result_augmentation.py
def __init__(self, evaluator: SR_evaluator, scope: str = "top", name: str = "RMSE") -> None:  # noqa: F821
    """
    Computes RMSE for the top models using a separate evaluator (e.g. a held-out test set).

    Args:
        evaluator: [SR_evaluator][SRToolkit.evaluation.sr_evaluator.SR_evaluator] used to
            score the models. Must be initialized with ``ranking_function="rmse"`` and a
            non-``None`` ``y``.
        scope: Which expressions to score.

            - ``"best"``: only the best expression.
            - ``"top"``: the best expression and all top-k models.
            - ``"all"``: everything in ``"top"`` plus all evaluated expressions.
        name: Key used in
            ``augmentations`` dict of [EvalResult][SRToolkit.utils.types.EvalResult] and
            [ModelResult][SRToolkit.utils.types.ModelResult].
            Default ``"RMSE"``.

    Raises:
        Exception: If ``evaluator.ranking_function != "rmse"`` or ``evaluator.y is None``.
    """
    super().__init__(name)
    self.evaluator = evaluator

    if scope not in ["best", "top", "all"]:
        raise Exception(f"[RMSE augmenter] Invalid scope: {scope}. Must be one of 'best', 'top', 'all'.")
    self.scope = scope

    if self.evaluator.ranking_function != "rmse":
        raise Exception("[RMSE augmenter] Ranking function of the evaluator must be set to 'rmse' to compute RMSE.")
    if self.evaluator.y is None:
        raise Exception("[RMSE augmenter] y in the evaluator must not be None to compute RMSE.")

write_results

write_results(results: EvalResult) -> None

Write RMSE scores into results and its models.

Stores {"min_error": ...} in EvalResult augmentations and {"error": ..., "parameters": ...} in each model's augmentations when scope is "top" or "all".

Parameters:

Name Type Description Default
results EvalResult

The EvalResult to augment.

required
Source code in SRToolkit/evaluation/result_augmentation.py
def write_results(self, results: EvalResult) -> None:
    """
    Write RMSE scores into *results* and its models.

    Stores ``{"min_error": ...}`` in
    [EvalResult][SRToolkit.utils.types.EvalResult] ``augmentations`` and
    ``{"error": ..., "parameters": ...}`` in each model's augmentations when ``scope``
    is ``"top"`` or ``"all"``.

    Args:
        results: The [EvalResult][SRToolkit.utils.types.EvalResult] to augment.
    """
    eval_data: Dict[str, Any] = {"min_error": self.evaluator.evaluate_expr(results.top_models[0].expr)}
    results.add_augmentation(self.name, eval_data, self._type)

    if self.scope == "top" or self.scope == "all":
        for model in results.top_models:
            key = "".join(model.expr)
            top_model_data: Dict[str, Any] = {
                "error": self.evaluator.evaluate_expr(model.expr),
                "parameters": self.evaluator.models[key].parameters,
            }
            model.add_augmentation(self.name, top_model_data, self._type)

    if self.scope == "all":
        for model in results.all_models:
            key = "".join(model.expr)
            all_model_data: Dict[str, Any] = {
                "error": self.evaluator.evaluate_expr(model.expr),
                "parameters": self.evaluator.models[key].parameters,
            }
            model.add_augmentation(self.name, all_model_data, self._type)

format_eval_result classmethod

format_eval_result(data: Dict[str, Any]) -> str

Format experiment-level RMSE data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "min_error".

required

Returns:

Type Description
str

A human-readable string, or empty string if no data is present.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_eval_result(cls, data: Dict[str, Any]) -> str:
    """
    Format experiment-level RMSE data for display.

    Args:
        data: Augmentation dict containing ``"min_error"``.

    Returns:
        A human-readable string, or empty string if no data is present.
    """
    val = data.get("min_error", "")
    return f"Test RMSE: {val}" if val != "" else ""

format_model_result classmethod

format_model_result(data: Dict[str, Any]) -> str

Format per-model RMSE data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "error" and optionally "parameters".

required

Returns:

Type Description
str

A human-readable string with RMSE and fitted parameters.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_model_result(cls, data: Dict[str, Any]) -> str:
    """
    Format per-model RMSE data for display.

    Args:
        data: Augmentation dict containing ``"error"`` and optionally ``"parameters"``.

    Returns:
        A human-readable string with RMSE and fitted parameters.
    """
    parts = [f"RMSE={data['error']:.6g}"]
    if "parameters" in data and data["parameters"] is not None:
        parts.append(f"params={np.round(data['parameters'], 4).tolist()}")
    return ", ".join(parts)

to_dict

to_dict(base_path: str, name: str) -> dict

Creates a dictionary representation of the RMSE augmenter.

Parameters:

Name Type Description Default
base_path str

Used to save the data of the evaluator to disk.

required
name str

Used to save the data of the evaluator to disk.

required

Returns:

Type Description
dict

A dictionary containing the necessary information to recreate the augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
def to_dict(self, base_path: str, name: str) -> dict:
    """
    Creates a dictionary representation of the RMSE augmenter.

    Args:
        base_path: Used to save the data of the evaluator to disk.
        name: Used to save the data of the evaluator to disk.

    Returns:
        A dictionary containing the necessary information to recreate the augmenter.
    """
    return {
        "format_version": 1,
        "name": self.name,
        "type": "RMSE",
        "scope": self.scope,
        "evaluator": self.evaluator.to_dict(base_path, name + "_RMSE_augmenter"),
    }

from_dict staticmethod

from_dict(data: dict) -> RMSE

Creates an instance of the RMSE augmenter from a dictionary.

Parameters:

Name Type Description Default
data dict

A dictionary containing the necessary information to recreate the augmenter.

required

Returns:

Type Description
RMSE

An instance of the RMSE augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
@staticmethod
def from_dict(data: dict) -> "RMSE":
    """
    Creates an instance of the RMSE augmenter from a dictionary.

    Args:
        data: A dictionary containing the necessary information to recreate the augmenter.

    Returns:
        An instance of the RMSE augmenter.
    """
    if data.get("format_version", 1) != 1:
        raise ValueError(
            f"[RMSE.from_dict] Unsupported format_version: {data.get('format_version')!r}. Expected 1."
        )
    evaluator = SR_evaluator.from_dict(data["evaluator"])
    return RMSE(evaluator, scope=data["scope"], name=data["name"])

EvalResult dataclass

EvalResult(min_error: float, best_expr: str, num_evaluated: int, evaluation_calls: int, top_models: List[ModelResult], all_models: List[ModelResult], approach_name: str, success: bool, dataset_name: Optional[str] = None, metadata: Optional[dict] = None, augmentations: Dict[str, Dict[str, Any]] = dict())

Result for a single SR experiment, as returned by SR_results[i].

Examples:

>>> model = ModelResult(expr=["X_0"], error=0.05)
>>> result = EvalResult(
...     min_error=0.05,
...     best_expr="X_0",
...     num_evaluated=500,
...     evaluation_calls=612,
...     top_models=[model],
...     all_models=[model],
...     approach_name="MyApproach",
...     success=True,
... )
>>> result.min_error
0.05
>>> result.success
True
>>> result.dataset_name is None
True

Attributes:

Name Type Description
min_error float

Lowest error achieved across all evaluated expressions.

best_expr str

String representation of the best expression found.

num_evaluated int

Number of unique expressions evaluated.

evaluation_calls int

Number of times evaluate_expr was called (includes cache hits).

top_models List[ModelResult]

Top-k models sorted by error.

all_models List[ModelResult]

All evaluated models sorted by error.

approach_name str

Name of the SR approach, or empty string if not provided.

success bool

Whether min_error is below the configured success_threshold.

dataset_name Optional[str]

Name of the dataset, extracted from metadata. None if not provided.

metadata Optional[dict]

Remaining metadata dict after dataset_name is popped. None if empty.

augmentations Dict[str, Dict[str, Any]]

Per-augmenter data keyed by augmenter name. Populated by ResultAugmenter subclasses via add_augmentation.

add_augmentation

add_augmentation(name: str, data: Dict[str, Any], aug_type: str) -> None

Attach augmentation data produced by a :class:ResultAugmenter to this result.

If name is already present in :attr:augmentations, a numeric suffix is appended (name_1, name_2, …) to avoid overwriting existing data.

Examples:

>>> model = ModelResult(expr=["X_0"], error=0.05)
>>> result = EvalResult(
...     min_error=0.05, best_expr="X_0", num_evaluated=10,
...     evaluation_calls=10, top_models=[model], all_models=[model],
...     approach_name="MyApproach", success=True,
... )
>>> result.add_augmentation("complexity", {"value": 3}, "ComplexityAugmenter")
>>> result.augmentations["complexity"]["value"]
3
>>> result.add_augmentation("complexity", {"value": 5}, "ComplexityAugmenter")
>>> "complexity_1" in result.augmentations
True

Parameters:

Name Type Description Default
name str

Key under which the augmentation is stored in :attr:augmentations. A suffix is added automatically if the key already exists.

required
data Dict[str, Any]

Arbitrary dict of augmentation data. A "_type" key is injected automatically and should not be included.

required
aug_type str

Augmenter class name, stored as data["_type"].

required
Source code in SRToolkit/utils/types.py
def add_augmentation(self, name: str, data: Dict[str, Any], aug_type: str) -> None:
    """
    Attach augmentation data produced by a :class:`ResultAugmenter` to this result.

    If ``name`` is already present in :attr:`augmentations`, a numeric suffix is
    appended (``name_1``, ``name_2``, …) to avoid overwriting existing data.

    Examples:
        >>> model = ModelResult(expr=["X_0"], error=0.05)
        >>> result = EvalResult(
        ...     min_error=0.05, best_expr="X_0", num_evaluated=10,
        ...     evaluation_calls=10, top_models=[model], all_models=[model],
        ...     approach_name="MyApproach", success=True,
        ... )
        >>> result.add_augmentation("complexity", {"value": 3}, "ComplexityAugmenter")
        >>> result.augmentations["complexity"]["value"]
        3
        >>> result.add_augmentation("complexity", {"value": 5}, "ComplexityAugmenter")
        >>> "complexity_1" in result.augmentations
        True

    Args:
        name: Key under which the augmentation is stored in :attr:`augmentations`.
            A suffix is added automatically if the key already exists.
        data: Arbitrary dict of augmentation data. A ``"_type"`` key is injected
            automatically and should not be included.
        aug_type: Augmenter class name, stored as ``data["_type"]``.
    """
    resolved = name
    counter = 1
    while resolved in self.augmentations:
        resolved = f"{name}_{counter}"
        counter += 1
    data["_type"] = aug_type
    self.augmentations[resolved] = data

to_dict

to_dict() -> dict

Serialize this evaluation result to a JSON-safe dictionary.

NumPy arrays and scalars within nested :class:ModelResult entries are converted to native Python types so the result can be passed directly to json.dump.

Examples:

>>> model = ModelResult(expr=["X_0"], error=0.05)
>>> result = EvalResult(
...     min_error=0.05, best_expr="X_0", num_evaluated=10,
...     evaluation_calls=10, top_models=[model], all_models=[model],
...     approach_name="MyApproach", success=True,
... )
>>> d = result.to_dict()
>>> d["min_error"]
0.05
>>> d["approach_name"]
'MyApproach'
>>> len(d["top_models"])
1

Returns:

Type Description
dict

A JSON-safe dictionary suitable for passing to :meth:from_dict.

Source code in SRToolkit/utils/types.py
def to_dict(self) -> dict:
    """
    Serialize this evaluation result to a JSON-safe dictionary.

    NumPy arrays and scalars within nested :class:`ModelResult` entries are
    converted to native Python types so the result can be passed directly
    to ``json.dump``.

    Examples:
        >>> model = ModelResult(expr=["X_0"], error=0.05)
        >>> result = EvalResult(
        ...     min_error=0.05, best_expr="X_0", num_evaluated=10,
        ...     evaluation_calls=10, top_models=[model], all_models=[model],
        ...     approach_name="MyApproach", success=True,
        ... )
        >>> d = result.to_dict()
        >>> d["min_error"]
        0.05
        >>> d["approach_name"]
        'MyApproach'
        >>> len(d["top_models"])
        1

    Returns:
        A JSON-safe dictionary suitable for passing to :meth:`from_dict`.
    """
    return {
        "min_error": float(self.min_error),
        "best_expr": self.best_expr,
        "num_evaluated": int(self.num_evaluated),
        "evaluation_calls": int(self.evaluation_calls),
        "top_models": [m.to_dict() for m in self.top_models],
        "all_models": [m.to_dict() for m in self.all_models],
        "approach_name": self.approach_name,
        "success": bool(self.success),
        "dataset_name": self.dataset_name,
        "metadata": self.metadata,
        "augmentations": _to_json_safe(self.augmentations),
    }

from_dict staticmethod

from_dict(data: dict) -> EvalResult

Reconstruct an :class:EvalResult from a dictionary produced by :meth:to_dict.

Examples:

>>> model = ModelResult(expr=["X_0"], error=0.05)
>>> result = EvalResult(
...     min_error=0.05, best_expr="X_0", num_evaluated=10,
...     evaluation_calls=10, top_models=[model], all_models=[model],
...     approach_name="MyApproach", success=True,
... )
>>> result2 = EvalResult.from_dict(result.to_dict())
>>> result2.min_error
0.05
>>> result2.best_expr
'X_0'
>>> len(result2.top_models)
1

Parameters:

Name Type Description Default
data dict

Dictionary representation of an :class:EvalResult, as produced by :meth:to_dict.

required

Returns:

Type Description
EvalResult

The reconstructed :class:EvalResult.

Source code in SRToolkit/utils/types.py
@staticmethod
def from_dict(data: dict) -> "EvalResult":
    """
    Reconstruct an :class:`EvalResult` from a dictionary produced by :meth:`to_dict`.

    Examples:
        >>> model = ModelResult(expr=["X_0"], error=0.05)
        >>> result = EvalResult(
        ...     min_error=0.05, best_expr="X_0", num_evaluated=10,
        ...     evaluation_calls=10, top_models=[model], all_models=[model],
        ...     approach_name="MyApproach", success=True,
        ... )
        >>> result2 = EvalResult.from_dict(result.to_dict())
        >>> result2.min_error
        0.05
        >>> result2.best_expr
        'X_0'
        >>> len(result2.top_models)
        1

    Args:
        data: Dictionary representation of an :class:`EvalResult`, as produced
            by :meth:`to_dict`.

    Returns:
        The reconstructed :class:`EvalResult`.
    """
    return EvalResult(
        min_error=data["min_error"],
        best_expr=data["best_expr"],
        num_evaluated=data["num_evaluated"],
        evaluation_calls=data["evaluation_calls"],
        top_models=[ModelResult.from_dict(m) for m in data["top_models"]],
        all_models=[ModelResult.from_dict(m) for m in data["all_models"]],
        approach_name=data["approach_name"],
        success=data["success"],
        dataset_name=data.get("dataset_name"),
        metadata=data.get("metadata"),
        augmentations=_from_json_safe(data["augmentations"]),
    )

ExpressionSimplifier

ExpressionSimplifier(symbol_library: SymbolLibrary, scope: str = 'top', verbose: bool = False, name: str = 'ExpressionSimplifier')

Bases: ResultAugmenter

Algebraically simplifies expressions inside the results using SymPy.

Parameters:

Name Type Description Default
symbol_library SymbolLibrary

Symbol library used by the simplifier to resolve token types.

required
scope str

Which expressions to simplify.

  • "best": only the best expression.
  • "top": the best expression and all top-k models.
  • "all": everything in "top" plus all evaluated expressions.
'top'
verbose bool

If True, emits a warning when simplification fails for an expression. Default False.

False
name str

Key used in augmentations dict of EvalResult and ModelResult. Default "ExpressionSimplifier".

'ExpressionSimplifier'
Source code in SRToolkit/evaluation/result_augmentation.py
def __init__(
    self,
    symbol_library: SymbolLibrary,
    scope: str = "top",
    verbose: bool = False,
    name: str = "ExpressionSimplifier",
) -> None:
    """
    Algebraically simplifies expressions inside the results using SymPy.

    Args:
        symbol_library: Symbol library used by the simplifier to resolve token types.
        scope: Which expressions to simplify.

            - ``"best"``: only the best expression.
            - ``"top"``: the best expression and all top-k models.
            - ``"all"``: everything in ``"top"`` plus all evaluated expressions.
        verbose: If ``True``, emits a warning when simplification fails for an expression.
            Default ``False``.
        name: Key used in
            ``augmentations`` dict of [EvalResult][SRToolkit.utils.types.EvalResult] and
            [ModelResult][SRToolkit.utils.types.ModelResult].
            Default ``"ExpressionSimplifier"``.
    """
    super().__init__(name)
    self.symbol_library = symbol_library

    if scope not in ["best", "top", "all"]:
        raise Exception(f"[RMSE augmenter] Invalid scope: {scope}. Must be one of 'best', 'top', 'all'.")
    self.scope = scope

    self.verbose = verbose

write_results

write_results(results: EvalResult) -> None

Write simplified expressions into results and its models.

Stores {"simplified_best_expr": ...} in EvalResult augmentations if simplification succeeds. Also stores {"simplified_expr": ...} in each model's augmentations when scope is "top" or "all".

Parameters:

Name Type Description Default
results EvalResult

The EvalResult to augment.

required
Source code in SRToolkit/evaluation/result_augmentation.py
def write_results(self, results: EvalResult) -> None:
    """
    Write simplified expressions into *results* and its models.

    Stores ``{"simplified_best_expr": ...}`` in
    [EvalResult][SRToolkit.utils.types.EvalResult] ``augmentations`` if
    simplification succeeds. Also stores ``{"simplified_expr": ...}`` in each model's
    augmentations when ``scope`` is ``"top"`` or ``"all"``.

    Args:
        results: The [EvalResult][SRToolkit.utils.types.EvalResult] to augment.
    """
    eval_data: Dict[str, Any] = {}
    try:
        simplified_expr = simplify(results.top_models[0].expr, self.symbol_library)
        if isinstance(simplified_expr, list):
            eval_data["simplified_best_expr"] = "".join(simplified_expr)
        elif isinstance(simplified_expr, Node):
            eval_data["simplified_best_expr"] = "".join(simplified_expr.to_list(self.symbol_library))
        else:
            raise Exception(f"Simplified expression is not a list or Node: {simplified_expr}")
    except Exception as e:
        if self.verbose:
            warnings.warn(f"Unable to simplify {results.best_expr}: {e}")
    results.add_augmentation(self.name, eval_data, self._type)

    if self.scope == "top" or self.scope == "all":
        for model in results.top_models:
            top_model_data: Dict[str, Any] = {}
            try:
                simplified_expr = simplify(model.expr, self.symbol_library)
                if isinstance(simplified_expr, list):
                    top_model_data["simplified_expr"] = "".join(simplified_expr)
                elif isinstance(simplified_expr, Node):
                    top_model_data["simplified_expr"] = "".join(simplified_expr.to_list(self.symbol_library))
                else:
                    raise Exception(f"Simplified expression is not a list or Node: {simplified_expr}")
            except Exception as e:
                if self.verbose:
                    warnings.warn(f"Unable to simplify {''.join(model.expr)}: {e}")
            model.add_augmentation(self.name, top_model_data, self._type)

    if self.scope == "all":
        for model in results.all_models:
            all_model_data: Dict[str, Any] = {}
            try:
                simplified_expr = simplify(model.expr, self.symbol_library)
                if isinstance(simplified_expr, list):
                    all_model_data["simplified_expr"] = "".join(simplified_expr)
                elif isinstance(simplified_expr, Node):
                    all_model_data["simplified_expr"] = "".join(simplified_expr.to_list(self.symbol_library))
                else:
                    raise Exception(f"Simplified expression is not a list or Node: {simplified_expr}")
            except Exception as e:
                if self.verbose:
                    warnings.warn(f"Unable to simplify {''.join(model.expr)}: {e}")
            model.add_augmentation(self.name, all_model_data, self._type)

format_eval_result classmethod

format_eval_result(data: Dict[str, Any]) -> str

Format experiment-level simplification data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "simplified_best_expr".

required

Returns:

Type Description
str

A human-readable string, or empty string if no data is present.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_eval_result(cls, data: Dict[str, Any]) -> str:
    """
    Format experiment-level simplification data for display.

    Args:
        data: Augmentation dict containing ``"simplified_best_expr"``.

    Returns:
        A human-readable string, or empty string if no data is present.
    """
    simplified = data.get("simplified_best_expr", "")
    return f"Simplified: {simplified}" if simplified else ""

format_model_result classmethod

format_model_result(data: Dict[str, Any]) -> str

Format per-model simplification data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "simplified_expr".

required

Returns:

Type Description
str

A human-readable string, or empty string if no data is present.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_model_result(cls, data: Dict[str, Any]) -> str:
    """
    Format per-model simplification data for display.

    Args:
        data: Augmentation dict containing ``"simplified_expr"``.

    Returns:
        A human-readable string, or empty string if no data is present.
    """
    simplified = data.get("simplified_expr", "")
    return f"Simplified: {simplified}" if simplified else ""

to_dict

to_dict(base_path: str, name: str) -> dict

Creates a dictionary representation of the ExpressionSimplifier augmenter.

Parameters:

Name Type Description Default
base_path str

Unused and ignored

required
name str

Unused and ignored

required

Returns:

Type Description
dict

A dictionary containing the necessary information to recreate the augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
def to_dict(self, base_path: str, name: str) -> dict:
    """
    Creates a dictionary representation of the ExpressionSimplifier augmenter.

    Args:
        base_path: Unused and ignored
        name: Unused and ignored

    Returns:
        A dictionary containing the necessary information to recreate the augmenter.
    """
    return {
        "format_version": 1,
        "type": "ExpressionSimplifier",
        "name": self.name,
        "symbol_library": self.symbol_library.to_dict(),
        "scope": self.scope,
        "verbose": self.verbose,
    }

from_dict staticmethod

from_dict(data: dict) -> ExpressionSimplifier

Creates an instance of the ExpressionSimplifier augmenter from a dictionary.

Parameters:

Name Type Description Default
data dict

A dictionary containing the necessary information to recreate the augmenter.

required

Returns: An instance of the ExpressionSimplifier augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
@staticmethod
def from_dict(data: dict) -> "ExpressionSimplifier":
    """
    Creates an instance of the ExpressionSimplifier augmenter from a dictionary.

    Args:
        data: A dictionary containing the necessary information to recreate the augmenter.
    Returns:
        An instance of the ExpressionSimplifier augmenter.
    """
    if data.get("format_version", 1) != 1:
        raise ValueError(
            f"[ExpressionSimplifier.from_dict] Unsupported format_version: {data.get('format_version')!r}. Expected 1."
        )
    return ExpressionSimplifier(
        symbol_library=data["symbol_library"],
        scope=data["scope"],
        verbose=data["verbose"],
        name=data["name"],
    )

ExpressionToLatex

ExpressionToLatex(symbol_library: SymbolLibrary, scope: str = 'top', verbose: bool = False, name: str = 'ExpressionToLatex')

Bases: ResultAugmenter

Converts expressions inside the results to LaTeX strings.

Parameters:

Name Type Description Default
symbol_library SymbolLibrary

Symbol library used to produce LaTeX templates for each token.

required
scope str

Which expressions to convert.

  • "best": only the best expression.
  • "top": the best expression and all top-k models.
  • "all": everything in "top" plus all evaluated expressions.
'top'
verbose bool

If True, emits a warning when LaTeX conversion fails for an expression. Default False.

False
name str

Key used in augmentations dict of EvalResult and ModelResult. Default "ExpressionToLatex".

'ExpressionToLatex'
Source code in SRToolkit/evaluation/result_augmentation.py
def __init__(
    self,
    symbol_library: SymbolLibrary,
    scope: str = "top",
    verbose: bool = False,
    name: str = "ExpressionToLatex",
) -> None:
    """
    Converts expressions inside the results to LaTeX strings.

    Args:
        symbol_library: Symbol library used to produce LaTeX templates for each token.
        scope: Which expressions to convert.

            - ``"best"``: only the best expression.
            - ``"top"``: the best expression and all top-k models.
            - ``"all"``: everything in ``"top"`` plus all evaluated expressions.
        verbose: If ``True``, emits a warning when LaTeX conversion fails for an expression.
            Default ``False``.
        name: Key used in
            ``augmentations`` dict of [EvalResult][SRToolkit.utils.types.EvalResult] and
            [ModelResult][SRToolkit.utils.types.ModelResult].
            Default ``"ExpressionToLatex"``.
    """
    super().__init__(name)
    self.symbol_library = symbol_library

    if scope not in ["best", "top", "all"]:
        raise Exception(f"[RMSE augmenter] Invalid scope: {scope}. Must be one of 'best', 'top', 'all'.")
    self.scope = scope

    self.verbose = verbose

write_results

write_results(results: EvalResult) -> None

Write LaTeX representations into results and its models.

Stores {"best_expr_latex": ...} in EvalResult augmentations. Also stores {"expr_latex": ...} in each model's augmentations when scope is "top" or "all".

Parameters:

Name Type Description Default
results EvalResult

The EvalResult to augment.

required
Source code in SRToolkit/evaluation/result_augmentation.py
def write_results(self, results: EvalResult) -> None:
    """
    Write LaTeX representations into *results* and its models.

    Stores ``{"best_expr_latex": ...}`` in
    [EvalResult][SRToolkit.utils.types.EvalResult] ``augmentations``.
    Also stores ``{"expr_latex": ...}`` in each model's augmentations when
    ``scope`` is ``"top"`` or ``"all"``.

    Args:
        results: The [EvalResult][SRToolkit.utils.types.EvalResult] to augment.
    """
    eval_data: Dict[str, Any] = {}
    try:
        eval_data["best_expr_latex"] = tokens_to_tree(results.top_models[0].expr, self.symbol_library).to_latex(
            self.symbol_library
        )
    except Exception as e:
        if self.verbose:
            warnings.warn(f"Unable to convert best expression to LaTeX: {e}")
    results.add_augmentation(self.name, eval_data, self._type)

    if self.scope == "top" or self.scope == "all":
        for model in results.top_models:
            try:
                model.add_augmentation(
                    self.name,
                    {"expr_latex": tokens_to_tree(model.expr, self.symbol_library).to_latex(self.symbol_library)},
                    self._type,
                )
            except Exception as e:
                if self.verbose:
                    warnings.warn(f"Unable to convert expression {''.join(model.expr)} to LaTeX: {e}")

    if self.scope == "all":
        for model in results.all_models:
            try:
                model.add_augmentation(
                    self.name,
                    {"expr_latex": tokens_to_tree(model.expr, self.symbol_library).to_latex(self.symbol_library)},
                    self._type,
                )
            except Exception as e:
                if self.verbose:
                    warnings.warn(f"Unable to convert expression {''.join(model.expr)} to LaTeX: {e}")

format_eval_result classmethod

format_eval_result(data: Dict[str, Any]) -> str

Format experiment-level LaTeX augmentation data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "best_expr_latex".

required

Returns:

Type Description
str

A human-readable string, or empty string if no data is present.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_eval_result(cls, data: Dict[str, Any]) -> str:
    """
    Format experiment-level LaTeX augmentation data for display.

    Args:
        data: Augmentation dict containing ``"best_expr_latex"``.

    Returns:
        A human-readable string, or empty string if no data is present.
    """
    latex = data.get("best_expr_latex", "")
    return f"LaTeX of the best expression: {latex}" if latex else ""

format_model_result classmethod

format_model_result(data: Dict[str, Any]) -> str

Format per-model LaTeX augmentation data for display.

Parameters:

Name Type Description Default
data Dict[str, Any]

Augmentation dict containing "expr_latex".

required

Returns:

Type Description
str

A human-readable string, or empty string if no data is present.

Source code in SRToolkit/evaluation/result_augmentation.py
@classmethod
def format_model_result(cls, data: Dict[str, Any]) -> str:
    """
    Format per-model LaTeX augmentation data for display.

    Args:
        data: Augmentation dict containing ``"expr_latex"``.

    Returns:
        A human-readable string, or empty string if no data is present.
    """
    latex = data.get("expr_latex", "")
    return f"LaTeX: {latex}" if latex else ""

to_dict

to_dict(base_path: str, name: str) -> dict

Creates a dictionary representation of the ExpressionToLatex augmenter.

Parameters:

Name Type Description Default
base_path str

Unused and ignored

required
name str

Unused and ignored

required

Returns:

Type Description
dict

A dictionary containing the necessary information to recreate the augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
def to_dict(self, base_path: str, name: str) -> dict:
    """
    Creates a dictionary representation of the ExpressionToLatex augmenter.

    Args:
        base_path: Unused and ignored
        name: Unused and ignored

    Returns:
        A dictionary containing the necessary information to recreate the augmenter.
    """
    return {
        "format_version": 1,
        "type": "ExpressionToLatex",
        "name": self.name,
        "symbol_library": self.symbol_library.to_dict(),
        "scope": self.scope,
        "verbose": self.verbose,
    }

from_dict staticmethod

from_dict(data: dict) -> ExpressionToLatex

Creates an instance of the ExpressionToLatex augmenter from a dictionary.

Parameters:

Name Type Description Default
data dict

A dictionary containing the necessary information to recreate the augmenter.

required

Returns:

Type Description
ExpressionToLatex

An instance of the ExpressionToLatex augmenter.

Source code in SRToolkit/evaluation/result_augmentation.py
@staticmethod
def from_dict(data: dict) -> "ExpressionToLatex":
    """
    Creates an instance of the ExpressionToLatex augmenter from a dictionary.

    Args:
        data: A dictionary containing the necessary information to recreate the augmenter.

    Returns:
        An instance of the ExpressionToLatex augmenter.
    """
    if data.get("format_version", 1) != 1:
        raise ValueError(
            f"[ExpressionToLatex.from_dict] Unsupported format_version: {data.get('format_version')!r}. Expected 1."
        )
    return ExpressionToLatex(
        symbol_library=data["symbol_library"],
        scope=data["scope"],
        verbose=data["verbose"],
        name=data["name"],
    )

ModelResult dataclass

ModelResult(expr: List[str], error: float, parameters: Optional[ndarray] = None, augmentations: Dict[str, Dict[str, Any]] = dict())

A single model entry in EvalResult.top_models and EvalResult.all_models.

Examples:

>>> result = ModelResult(expr=["C", "*", "X_0"], error=0.42)
>>> result.expr
['C', '*', 'X_0']
>>> result.error
0.42
>>> result.parameters is None
True

Attributes:

Name Type Description
expr List[str]

Token list representing the expression, e.g. ["C", "*", "X_0"].

error float

Numeric error under the ranking function (RMSE or BED).

parameters Optional[ndarray]

Fitted constant values. Present for RMSE ranking only, None otherwise.

augmentations Dict[str, Dict[str, Any]]

Per-augmenter data keyed by augmenter name. Populated by ResultAugmenter subclasses via add_augmentation.

add_augmentation

add_augmentation(name: str, data: Dict[str, Any], aug_type: str) -> None

Attach augmentation data produced by a :class:ResultAugmenter to this result.

If name is already present in :attr:augmentations, a numeric suffix is appended (name_1, name_2, …) to avoid overwriting existing data.

Examples:

>>> result = ModelResult(expr=["X_0"], error=0.1)
>>> result.add_augmentation("latex", {"value": "$X_0$"}, "LaTeXAugmenter")
>>> result.augmentations["latex"]["value"]
'$X_0$'
>>> result.add_augmentation("latex", {"value": "$X_0$"}, "LaTeXAugmenter")
>>> "latex_1" in result.augmentations
True

Parameters:

Name Type Description Default
name str

Key under which the augmentation is stored in :attr:augmentations. A suffix is added automatically if the key already exists.

required
data Dict[str, Any]

Arbitrary dict of augmentation data. A "_type" key is injected automatically and should not be included.

required
aug_type str

Augmenter class name, stored as data["_type"].

required
Source code in SRToolkit/utils/types.py
def add_augmentation(self, name: str, data: Dict[str, Any], aug_type: str) -> None:
    """
    Attach augmentation data produced by a :class:`ResultAugmenter` to this result.

    If ``name`` is already present in :attr:`augmentations`, a numeric suffix is
    appended (``name_1``, ``name_2``, …) to avoid overwriting existing data.

    Examples:
        >>> result = ModelResult(expr=["X_0"], error=0.1)
        >>> result.add_augmentation("latex", {"value": "$X_0$"}, "LaTeXAugmenter")
        >>> result.augmentations["latex"]["value"]
        '$X_0$'
        >>> result.add_augmentation("latex", {"value": "$X_0$"}, "LaTeXAugmenter")
        >>> "latex_1" in result.augmentations
        True

    Args:
        name: Key under which the augmentation is stored in :attr:`augmentations`.
            A suffix is added automatically if the key already exists.
        data: Arbitrary dict of augmentation data. A ``"_type"`` key is injected
            automatically and should not be included.
        aug_type: Augmenter class name, stored as ``data["_type"]``.
    """
    resolved = name
    counter = 1
    while resolved in self.augmentations:
        resolved = f"{name}_{counter}"
        counter += 1
    data["_type"] = aug_type
    self.augmentations[resolved] = data

to_dict

to_dict() -> dict

Serialize this model result to a JSON-safe dictionary.

NumPy arrays and scalars are converted to native Python types so the result can be passed directly to json.dump.

Examples:

>>> result = ModelResult(expr=["X_0", "+", "C"], error=0.25)
>>> d = result.to_dict()
>>> d["expr"]
['X_0', '+', 'C']
>>> d["error"]
0.25
>>> d["parameters"] is None
True

Returns:

Type Description
dict

A JSON-safe dictionary suitable for passing to :meth:from_dict.

Source code in SRToolkit/utils/types.py
def to_dict(self) -> dict:
    """
    Serialize this model result to a JSON-safe dictionary.

    NumPy arrays and scalars are converted to native Python types so the
    result can be passed directly to ``json.dump``.

    Examples:
        >>> result = ModelResult(expr=["X_0", "+", "C"], error=0.25)
        >>> d = result.to_dict()
        >>> d["expr"]
        ['X_0', '+', 'C']
        >>> d["error"]
        0.25
        >>> d["parameters"] is None
        True

    Returns:
        A JSON-safe dictionary suitable for passing to :meth:`from_dict`.
    """
    return {
        "expr": self.expr,
        "error": float(self.error),
        "parameters": _to_json_safe(self.parameters),
        "augmentations": _to_json_safe(self.augmentations),
    }

from_dict staticmethod

from_dict(data: dict) -> ModelResult

Reconstruct a :class:ModelResult from a dictionary produced by :meth:to_dict.

Examples:

>>> result = ModelResult(expr=["X_0", "+", "C"], error=0.25)
>>> result2 = ModelResult.from_dict(result.to_dict())
>>> result2.expr
['X_0', '+', 'C']
>>> result2.error
0.25

Parameters:

Name Type Description Default
data dict

Dictionary representation of a :class:ModelResult, as produced by :meth:to_dict.

required

Returns:

Type Description
ModelResult

The reconstructed :class:ModelResult.

Source code in SRToolkit/utils/types.py
@staticmethod
def from_dict(data: dict) -> "ModelResult":
    """
    Reconstruct a :class:`ModelResult` from a dictionary produced by :meth:`to_dict`.

    Examples:
        >>> result = ModelResult(expr=["X_0", "+", "C"], error=0.25)
        >>> result2 = ModelResult.from_dict(result.to_dict())
        >>> result2.expr
        ['X_0', '+', 'C']
        >>> result2.error
        0.25

    Args:
        data: Dictionary representation of a :class:`ModelResult`, as produced
            by :meth:`to_dict`.

    Returns:
        The reconstructed :class:`ModelResult`.
    """
    return ModelResult(
        expr=data["expr"],
        error=data["error"],
        parameters=_from_json_safe(data["parameters"]),
        augmentations=_from_json_safe(data["augmentations"]),
    )

ResultAugmenter

ResultAugmenter(name: str)

Bases: ABC

Base class for result augmenters. Subclasses implement write_results to compute and store additional data in an EvalResult via add_augmentation.

For concrete implementations, see result_augmentation.

Parameters:

Name Type Description Default
name str

Identifier used as the key in augmentations dict of EvalResult and ModelResult. If two augmenters share the same name, add_augmentation appends a numeric suffix automatically.

required
Source code in SRToolkit/evaluation/sr_evaluator.py
def __init__(self, name: str):
    """
    Base class for result augmenters. Subclasses implement
    [write_results][SRToolkit.evaluation.sr_evaluator.ResultAugmenter.write_results] to compute
    and store additional data in an [EvalResult][SRToolkit.utils.types.EvalResult] via
    [add_augmentation][SRToolkit.utils.types.EvalResult.add_augmentation].

    For concrete implementations, see
    [result_augmentation][SRToolkit.evaluation.result_augmentation].

    Args:
        name: Identifier used as the key in
            ``augmentations`` dict of [EvalResult][SRToolkit.utils.types.EvalResult] and
            [ModelResult][SRToolkit.utils.types.ModelResult].
            If two augmenters share the same name,
            [add_augmentation][SRToolkit.utils.types.EvalResult.add_augmentation] appends a
            numeric suffix automatically.
    """
    self.name = name

write_results abstractmethod

write_results(results: EvalResult) -> None

Compute and write augmentation data into results and its models.

Call results.add_augmentation(self.name, data, self._type) for experiment-level data and model.add_augmentation(self.name, data, self._type) for per-model data.

Parameters:

Name Type Description Default
results EvalResult

The EvalResult to augment.

required
Source code in SRToolkit/evaluation/sr_evaluator.py
@abstractmethod
def write_results(
    self,
    results: "EvalResult",
) -> None:
    """
    Compute and write augmentation data into *results* and its models.

    Call ``results.add_augmentation(self.name, data, self._type)`` for experiment-level
    data and ``model.add_augmentation(self.name, data, self._type)`` for per-model data.

    Args:
        results: The [EvalResult][SRToolkit.utils.types.EvalResult] to augment.
    """

to_dict abstractmethod

to_dict(base_path: str, name: str) -> dict

Transforms the augmenter into a dictionary. This is used for saving the augmenter to disk.

Parameters:

Name Type Description Default
base_path str

The base path used for saving the data inside the augmenter, if needed.

required
name str

The name/identifier used by the augmenter for saving.

required

Returns:

Type Description
dict

A dictionary containing the necessary information to recreate the augmenter.

Source code in SRToolkit/evaluation/sr_evaluator.py
@abstractmethod
def to_dict(self, base_path: str, name: str) -> dict:
    """
    Transforms the augmenter into a dictionary. This is used for saving the augmenter to disk.

    Args:
        base_path: The base path used for saving the data inside the augmenter, if needed.
        name: The name/identifier used by the augmenter for saving.

    Returns:
        A dictionary containing the necessary information to recreate the augmenter.
    """

format_eval_result classmethod

format_eval_result(data: Dict[str, Any]) -> str

Returns a formatted string for experiment-level augmentation data.

Subclasses override this for custom formatting. The data dict is the inner augmentation dictionary (includes _type).

Parameters:

Name Type Description Default
data Dict[str, Any]

The augmentation data dictionary.

required

Returns:

Type Description
str

A formatted string, or empty string if no relevant data exists.

Source code in SRToolkit/evaluation/sr_evaluator.py
@classmethod
def format_eval_result(cls, data: Dict[str, Any]) -> str:
    """
    Returns a formatted string for experiment-level augmentation data.

    Subclasses override this for custom formatting. The *data* dict is the inner
    augmentation dictionary (includes ``_type``).

    Args:
        data: The augmentation data dictionary.

    Returns:
        A formatted string, or empty string if no relevant data exists.
    """
    return "\n".join(f"  {k}: {v}" for k, v in data.items() if k != "_type")

format_model_result classmethod

format_model_result(data: Dict[str, Any]) -> str

Returns a formatted string for a single model's augmentation data.

Subclasses override this for custom formatting. The data dict is the inner augmentation dictionary (includes _type).

Parameters:

Name Type Description Default
data Dict[str, Any]

The augmentation data dictionary.

required

Returns:

Type Description
str

A formatted string, or empty string if no relevant data exists.

Source code in SRToolkit/evaluation/sr_evaluator.py
@classmethod
def format_model_result(cls, data: Dict[str, Any]) -> str:
    """
    Returns a formatted string for a single model's augmentation data.

    Subclasses override this for custom formatting. The *data* dict is the inner
    augmentation dictionary (includes ``_type``).

    Args:
        data: The augmentation data dictionary.

    Returns:
        A formatted string, or empty string if no relevant data exists.
    """
    parts = [f"{k}={v}" for k, v in data.items() if k != "_type"]
    return ", ".join(parts)

from_dict staticmethod

from_dict(data: dict) -> ResultAugmenter

Creates an instance of the ResultAugmenter class from the dictionary with the relevant data.

Subclasses should override this method if they support serialization. The default implementation raises NotImplementedError, allowing custom augmenters to skip serialization if not needed.

Parameters:

Name Type Description Default
data dict

the dictionary containing the data needed to recreate the augmenter.

required

Returns:

Type Description
ResultAugmenter

An instance of the ResultAugmenter class with the same configuration as in the data dictionary.

Raises:

Type Description
NotImplementedError

If the subclass does not implement this method.

Source code in SRToolkit/evaluation/sr_evaluator.py
@staticmethod
def from_dict(data: dict) -> "ResultAugmenter":
    """
    Creates an instance of the ResultAugmenter class from the dictionary with the relevant data.

    Subclasses should override this method if they support serialization. The default
    implementation raises ``NotImplementedError``, allowing custom augmenters to skip
    serialization if not needed.

    Args:
        data: the dictionary containing the data needed to recreate the augmenter.

    Returns:
        An instance of the ResultAugmenter class with the same configuration as in the data dictionary.

    Raises:
        NotImplementedError: If the subclass does not implement this method.
    """
    raise NotImplementedError(
        "from_dict is not implemented for this augmenter. "
        "Override this method if your augmenter supports serialization."
    )

SR_evaluator

SR_evaluator(X: ndarray, y: Optional[ndarray] = None, symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(), max_evaluations: int = -1, success_threshold: Optional[float] = None, ranking_function: str = 'rmse', ground_truth: Optional[Union[List[str], Node, ndarray]] = None, seed: Optional[int] = None, metadata: Optional[dict] = None, **kwargs: Unpack[EstimationSettings])

Evaluates symbolic regression expressions and ranks them by RMSE or Behavioral Expression Distance (BED).

Previously evaluated expressions are cached so repeated calls with the same expression are free. Results are collected via get_results.

Note

Determining whether two expressions are semantically equivalent is undecidable. Random sampling, parameter fitting, and numerical errors all make the success_threshold only a proxy for success — we recommend inspecting the best expression manually.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True

Parameters:

Name Type Description Default
X ndarray

Input data of shape (n_samples, n_features).

required
y Optional[ndarray]

Target values of shape (n_samples,). Required when ranking_function="rmse".

None
symbol_library SymbolLibrary

Symbol library defining the token vocabulary. Defaults to SymbolLibrary.default_symbols.

default_symbols()
max_evaluations int

Maximum number of expressions to evaluate. -1 means no limit. Default -1.

-1
success_threshold Optional[float]

Error value below which an expression is considered successful. If None, defaults to 1e-7 for RMSE and is auto-calculated for BED by evaluating the ground truth against itself 100 times and taking max(distances) * 1.1. If less than 0, no threshold is used.

None
ranking_function str

"rmse" or "bed". Default "rmse".

'rmse'
ground_truth Optional[Union[List[str], Node, ndarray]]

Required when ranking_function="bed". The target expression as a token list, a Node tree, or a pre-computed behavior matrix (see create_behavior_matrix).

None
seed Optional[int]

Random seed for reproducible sampling. Default None.

None
metadata Optional[dict]

Optional dict with information about this evaluation (e.g. dataset name, seed). If a "dataset_name" key is present it is extracted into EvalResult dataset_name.

None
**kwargs Unpack[EstimationSettings]

Optional settings from EstimationSettings. Supported keys: method, tol, gtol, max_iter, constant_bounds, initialization, max_constants, max_expr_length, num_points_sampled, bed_X, num_consts_sampled, domain_bounds.

{}

Attributes:

Name Type Description
models

Cached ModelResult for every evaluated expression, keyed by the concatenated token string.

invalid

Token strings of expressions that raised an exception during evaluation.

ground_truth

The target expression passed at construction (BED mode).

gt_behavior

Pre-computed behavior matrix for the ground truth (BED mode).

max_evaluations

Maximum number of expressions to evaluate.

bed_evaluation_parameters

Active BED evaluation settings dict.

metadata

Metadata dict passed at construction.

symbol_library

The symbol library used.

total_evaluations

Number of times evaluate_expr has been called, including cache hits.

seed

Random seed.

parameter_estimator

ParameterEstimator instance used in RMSE mode.

ranking_function

Active ranking function ("rmse" or "bed").

success_threshold

Error threshold for determining success.

Source code in SRToolkit/evaluation/sr_evaluator.py
def __init__(
    self,
    X: np.ndarray,
    y: Optional[np.ndarray] = None,
    symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
    max_evaluations: int = -1,
    success_threshold: Optional[float] = None,
    ranking_function: str = "rmse",
    ground_truth: Optional[Union[List[str], Node, np.ndarray]] = None,
    seed: Optional[int] = None,
    metadata: Optional[dict] = None,
    **kwargs: Unpack[EstimationSettings],
):
    """
    Evaluates symbolic regression expressions and ranks them by RMSE or Behavioral Expression Distance (BED).

    Previously evaluated expressions are cached so repeated calls with the same expression
    are free. Results are collected via
    [get_results][SRToolkit.evaluation.sr_evaluator.SR_evaluator.get_results].

    Note:
        Determining whether two expressions are semantically equivalent is undecidable.
        Random sampling, parameter fitting, and numerical errors all make the
        ``success_threshold`` only a proxy for success — we recommend inspecting the best
        expression manually.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> print(rmse < 1e-6)
        True

    Args:
        X: Input data of shape ``(n_samples, n_features)``.
        y: Target values of shape ``(n_samples,)``. Required when ``ranking_function="rmse"``.
        symbol_library: Symbol library defining the token vocabulary.
            Defaults to [SymbolLibrary.default_symbols][SRToolkit.utils.symbol_library.SymbolLibrary.default_symbols].
        max_evaluations: Maximum number of expressions to evaluate. ``-1`` means no limit.
            Default ``-1``.
        success_threshold: Error value below which an expression is considered successful.
            If ``None``, defaults to ``1e-7`` for RMSE and is auto-calculated for BED by
            evaluating the ground truth against itself 100 times and taking
            ``max(distances) * 1.1``. If less than 0, no threshold is used.
        ranking_function: ``"rmse"`` or ``"bed"``. Default ``"rmse"``.
        ground_truth: Required when ``ranking_function="bed"``. The target expression as a
            token list, a [Node][SRToolkit.utils.expression_tree.Node] tree, or a pre-computed
            behavior matrix (see
            [create_behavior_matrix][SRToolkit.utils.measures.create_behavior_matrix]).
        seed: Random seed for reproducible sampling. Default ``None``.
        metadata: Optional dict with information about this evaluation (e.g. dataset name,
            seed). If a ``"dataset_name"`` key is present it is extracted into
            [EvalResult][SRToolkit.utils.types.EvalResult] ``dataset_name``.
        **kwargs: Optional settings from
            [EstimationSettings][SRToolkit.utils.types.EstimationSettings].
            Supported keys: ``method``, ``tol``, ``gtol``, ``max_iter``,
            ``constant_bounds``, ``initialization``, ``max_constants``,
            ``max_expr_length``, ``num_points_sampled``, ``bed_X``,
            ``num_consts_sampled``, ``domain_bounds``.

    Attributes:
        models: Cached [ModelResult][SRToolkit.utils.types.ModelResult] for every evaluated expression,
            keyed by the concatenated token string.
        invalid: Token strings of expressions that raised an exception during evaluation.
        ground_truth: The target expression passed at construction (BED mode).
        gt_behavior: Pre-computed behavior matrix for the ground truth (BED mode).
        max_evaluations: Maximum number of expressions to evaluate.
        bed_evaluation_parameters: Active BED evaluation settings dict.
        metadata: Metadata dict passed at construction.
        symbol_library: The symbol library used.
        total_evaluations: Number of times
            [evaluate_expr][SRToolkit.evaluation.sr_evaluator.SR_evaluator.evaluate_expr]
            has been called, including cache hits.
        seed: Random seed.
        parameter_estimator: [ParameterEstimator][SRToolkit.evaluation.parameter_estimator.ParameterEstimator]
            instance used in RMSE mode.
        ranking_function: Active ranking function (``"rmse"`` or ``"bed"``).
        success_threshold: Error threshold for determining success.
    """
    self.kwargs = kwargs
    self.models: Dict[str, ModelResult] = dict()
    self.invalid: List[str] = list()
    self.success_threshold = success_threshold
    self.metadata = metadata
    self.ground_truth = ground_truth
    self.gt_behavior = None
    self._callbacks: Optional[Union[CallbackDispatcher, SRCallbacks]] = None
    self._experiment_id: str = ""
    self.should_stop = False
    self._current_best_error = float("inf")
    self.bed_evaluation_parameters: Dict[str, Any] = {
        "bed_X": None,
        "num_consts_sampled": 32,
        "num_points_sampled": 64,
        "domain_bounds": None,
        "constant_bounds": (-5, 5),
    }
    if kwargs:
        for k in self.bed_evaluation_parameters.keys():
            if k in kwargs:
                self.bed_evaluation_parameters[k] = kwargs[k]  # type: ignore[literal-required]
    if self.bed_evaluation_parameters["num_points_sampled"] == -1:
        self.bed_evaluation_parameters["num_points_sampled"] = X.shape[0]

    self.symbol_library = symbol_library
    self.max_evaluations = max_evaluations
    self.total_evaluations = 0
    self.seed = seed
    if seed is not None:
        np.random.seed(seed)

    if ranking_function not in ["rmse", "bed"]:
        warnings.warn(f"ranking_function '{ranking_function}' not supported. Using rmse instead.")
        ranking_function = "rmse"
    self.ranking_function = ranking_function

    if ranking_function == "rmse":
        if y is None:
            raise ValueError("Target values must be provided for RMSE ranking function.")
        self.parameter_estimator = ParameterEstimator(X, y, symbol_library=symbol_library, seed=seed, **kwargs)

        if self.success_threshold is None:
            self.success_threshold = 1e-7

    elif ranking_function == "bed":
        if ground_truth is None:
            raise ValueError(
                "Ground truth must be provided for bed ranking function. The ground truth must be "
                "provided as a list of tokens, a Node object, or a numpy array representing behavior. "
                "The behavior matrix is a matrix representing the distribution of outputs of an "
                "expression with free parameters at different points in the domain. This matrix "
                "should be of size (num_points_sampled, num_consts_sampled). See "
                "SRToolkit.utils.create_behavior_matrix for more details."
            )
        else:
            if self.bed_evaluation_parameters["bed_X"] is None:
                if self.bed_evaluation_parameters["domain_bounds"] is not None:
                    db = self.bed_evaluation_parameters["domain_bounds"]
                    assert isinstance(db, List), "Domain bounds should be a list of tuples."
                    interval_length = np.array([ub - lb for (lb, ub) in db])
                    lower_bound = np.array([lb for (lb, ub) in db])
                    lho = LatinHypercube(len(db), optimization="random-cd", seed=seed)
                    self.bed_evaluation_parameters["bed_X"] = (
                        lho.random(self.bed_evaluation_parameters["num_points_sampled"]) * interval_length
                        + lower_bound
                    )
                else:
                    indices = np.random.choice(
                        X.shape[0],
                        size=self.bed_evaluation_parameters["num_points_sampled"],
                    )
                    self.bed_evaluation_parameters["bed_X"] = X[indices, :]

        if isinstance(ground_truth, (list, Node)):
            self.gt_behavior = create_behavior_matrix(
                ground_truth,
                self.bed_evaluation_parameters["bed_X"],
                num_consts_sampled=self.bed_evaluation_parameters["num_consts_sampled"],
                consts_bounds=self.bed_evaluation_parameters["constant_bounds"],
                symbol_library=self.symbol_library,
                seed=self.seed,
            )
        elif isinstance(ground_truth, np.ndarray):
            self.gt_behavior = ground_truth
        else:
            raise ValueError(
                "Ground truth must be provided as a list of tokens, a Node object, or a numpy array representing behavior."
            )

        if self.success_threshold is None:
            assert self.ground_truth is not None, "Ground truth must be provided for BED ranking function."
            distances = [
                bed(
                    self.ground_truth,
                    self.gt_behavior,
                    self.bed_evaluation_parameters["bed_X"],
                    num_consts_sampled=self.bed_evaluation_parameters["num_consts_sampled"],
                    num_points_sampled=self.bed_evaluation_parameters["num_points_sampled"],
                    domain_bounds=self.bed_evaluation_parameters["domain_bounds"],
                    consts_bounds=self.bed_evaluation_parameters["constant_bounds"],
                    symbol_library=self.symbol_library,
                )
                for i in range(100)
            ]
            self.success_threshold = np.max(distances) * 1.1

    self._callbacks = CallbackDispatcher(
        callbacks=[EarlyStoppingCallback(threshold=self.success_threshold, max_evaluations=max_evaluations)]
    )
    self.X = X
    self.y = y

set_callbacks

set_callbacks(callbacks: Optional[Union[SRCallbacks, CallbackDispatcher]] = None) -> None

Register callbacks for monitoring and early stopping.

A single SRCallbacks instance is automatically wrapped in a CallbackDispatcher.

Examples:

>>> from SRToolkit.evaluation.callbacks import EarlyStoppingCallback
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> se.set_callbacks(EarlyStoppingCallback(threshold=1e-6))
>>> se._callbacks is not None
True

Parameters:

Name Type Description Default
callbacks Optional[Union[SRCallbacks, CallbackDispatcher]]

A CallbackDispatcher or a single SRCallbacks instance.

None
Source code in SRToolkit/evaluation/sr_evaluator.py
def set_callbacks(self, callbacks: Optional[Union[SRCallbacks, CallbackDispatcher]] = None) -> None:
    """
    Register callbacks for monitoring and early stopping.

    A single [SRCallbacks][SRToolkit.evaluation.callbacks.SRCallbacks] instance is
    automatically wrapped in a
    [CallbackDispatcher][SRToolkit.evaluation.callbacks.CallbackDispatcher].

    Examples:
        >>> from SRToolkit.evaluation.callbacks import EarlyStoppingCallback
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> se.set_callbacks(EarlyStoppingCallback(threshold=1e-6))
        >>> se._callbacks is not None
        True

    Args:
        callbacks: A [CallbackDispatcher][SRToolkit.evaluation.callbacks.CallbackDispatcher]
            or a single [SRCallbacks][SRToolkit.evaluation.callbacks.SRCallbacks] instance.
    """
    if isinstance(callbacks, CallbackDispatcher):
        if self._callbacks is not None:
            if isinstance(self._callbacks, SRCallbacks):
                old_callbacks = [self._callbacks]
            if isinstance(self._callbacks, CallbackDispatcher):
                old_callbacks = self._callbacks.get_callbacks()
            else:
                old_callbacks = []

        self._callbacks = callbacks
        for cb in old_callbacks:
            self._callbacks.add(cb)
    elif isinstance(callbacks, SRCallbacks):
        if isinstance(self._callbacks, CallbackDispatcher):
            self._callbacks.add(callbacks)
        elif isinstance(self._callbacks, SRCallbacks):
            self._callbacks = CallbackDispatcher(callbacks=[self._callbacks, callbacks])
        else:
            self._callbacks = CallbackDispatcher(callbacks=[callbacks])

evaluate_expr

evaluate_expr(expr: Union[List[str], Node], simplify_expr: bool = False, verbose: int = 0) -> float

Evaluates an expression in infix notation and stores the result in memory to prevent re-evaluation.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
>>> X = np.array([[0, 1], [0, 2], [0, 3]])
>>> y = np.array([2, 3, 4])
>>> se = SR_evaluator(X, y, seed=42, success_threshold=-1)
>>> rmse = se.evaluate_expr(["C", "+", "C", "*", "C", "+", "X_0", "*", "X_1", "/", "X_0"], simplify_expr=True)
>>> print(rmse < 1e-6)
True
>>> list(se.models.keys())[0]
'C+X_1'
>>> print(0.99 < se.models["C+X_1"].parameters[0] < 1.01)
True
>>> # Evaluating invalid expression returns nan and adds it to invalid list
>>> print(se.evaluate_expr(["C", "*", "X_1", "X_0"]))
nan
>>> se.invalid
['C*X_1X_0']
>>> X = np.random.rand(10, 2) - 0.5
>>> gt = ["X_0", "+", "C"]
>>> se = SR_evaluator(X, ground_truth=gt, ranking_function="bed", seed=42)
>>> print(se.evaluate_expr(["C", "+", "X_1"]) < 1)
True
>>> # When evaluating using BED as the ranking function, the error depends on the scale of output of the
>>> # ground truth. Because of stochasticity of BED, error might be high even when expressions match exactly.
>>> print(se.evaluate_expr(["C", "+", "X_0"]) < 0.2)
True
>>> # X can also be sampled from a domain by providing domain_bounds
>>> se = SR_evaluator(X, ground_truth=gt, ranking_function="bed", domain_bounds=[(-1, 1), (-1, 1)], seed=42)
>>> print(se.evaluate_expr(["C", "+", "X_0"]) < 0.2)
True

Parameters:

Name Type Description Default
expr Union[List[str], Node]

Expression as a token list in infix notation or a Node tree.

required
simplify_expr bool

If True, simplifies the expression with SymPy before evaluating. Slows down evaluation; recommended only for post-hoc inspection of top results. Default False.

False
verbose int

0 — silent; 1 — logs expression, error, and fitted parameters; 2 — also surfaces NumPy warnings during evaluation. Default 0.

0

Returns:

Type Description
float

The error of the expression under the active ranking function: RMSE when ranking_function="rmse", BED when ranking_function="bed". Returns NaN if the expression is invalid or max_evaluations has been reached (a warning is emitted in the latter case). If the expression was already evaluated, the cached value is returned immediately.

Source code in SRToolkit/evaluation/sr_evaluator.py
def evaluate_expr(
    self,
    expr: Union[List[str], Node],
    simplify_expr: bool = False,
    verbose: int = 0,
) -> float:
    """
    Evaluates an expression in infix notation and stores the result in
    memory to prevent re-evaluation.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> print(rmse < 1e-6)
        True
        >>> X = np.array([[0, 1], [0, 2], [0, 3]])
        >>> y = np.array([2, 3, 4])
        >>> se = SR_evaluator(X, y, seed=42, success_threshold=-1)
        >>> rmse = se.evaluate_expr(["C", "+", "C", "*", "C", "+", "X_0", "*", "X_1", "/", "X_0"], simplify_expr=True)
        >>> print(rmse < 1e-6)
        True
        >>> list(se.models.keys())[0]
        'C+X_1'
        >>> print(0.99 < se.models["C+X_1"].parameters[0] < 1.01)
        True
        >>> # Evaluating invalid expression returns nan and adds it to invalid list
        >>> print(se.evaluate_expr(["C", "*", "X_1", "X_0"]))
        nan
        >>> se.invalid
        ['C*X_1X_0']
        >>> X = np.random.rand(10, 2) - 0.5
        >>> gt = ["X_0", "+", "C"]
        >>> se = SR_evaluator(X, ground_truth=gt, ranking_function="bed", seed=42)
        >>> print(se.evaluate_expr(["C", "+", "X_1"]) < 1)
        True
        >>> # When evaluating using BED as the ranking function, the error depends on the scale of output of the
        >>> # ground truth. Because of stochasticity of BED, error might be high even when expressions match exactly.
        >>> print(se.evaluate_expr(["C", "+", "X_0"]) < 0.2)
        True
        >>> # X can also be sampled from a domain by providing domain_bounds
        >>> se = SR_evaluator(X, ground_truth=gt, ranking_function="bed", domain_bounds=[(-1, 1), (-1, 1)], seed=42)
        >>> print(se.evaluate_expr(["C", "+", "X_0"]) < 0.2)
        True

    Args:
        expr: Expression as a token list in infix notation or a
            [Node][SRToolkit.utils.expression_tree.Node] tree.
        simplify_expr: If ``True``, simplifies the expression with SymPy before evaluating.
            Slows down evaluation; recommended only for post-hoc inspection of top results.
            Default ``False``.
        verbose: ``0`` — silent; ``1`` — logs expression, error, and fitted parameters;
            ``2`` — also surfaces NumPy warnings during evaluation. Default ``0``.

    Returns:
        The error of the expression under the active ranking function: RMSE when ``ranking_function="rmse"``, BED when ``ranking_function="bed"``. Returns ``NaN`` if the expression is invalid or ``max_evaluations`` has been reached (a warning is emitted in the latter case). If the expression was already evaluated, the cached value is returned immediately.
    """
    self.total_evaluations += 1

    if self.should_stop:
        warnings.warn(
            f"Evaluation stopped because max_evaluations ({self.max_evaluations}) reached or an expression with error lower than success_threshold ({self.success_threshold}) was found. "
        )
        return np.nan
    else:
        if simplify_expr:
            try:
                expr = simplify(expr, self.symbol_library)
            except Exception as e:
                if isinstance(expr, Node):
                    expr_list = expr.to_list(symbol_library=self.symbol_library)
                else:
                    expr_list = expr
                warnings.warn(f"Unable to simplify: {''.join(expr_list)}, problems with subexpression {e}")

        if isinstance(expr, Node):
            expr_list = expr.to_list(symbol_library=self.symbol_library)
        else:
            expr_list = expr

        expr_str = "".join(expr_list)
        if expr_str in self.models:
            if verbose > 0:
                logger.debug("Already evaluated %s", expr_str)
            if self._callbacks is not None:
                event = ExprEvaluated(
                    expression=expr_str,
                    error=self.models[expr_str].error,
                    evaluation_number=self.total_evaluations,
                    experiment_id=self._experiment_id,
                    is_new_best=False,
                )
                if not self._callbacks.on_expr_evaluated(event):
                    self.should_stop = True
            return self.models[expr_str].error

        else:
            if self.ranking_function == "rmse":
                try:
                    with (
                        np.errstate(
                            divide="ignore",
                            invalid="ignore",
                            over="ignore",
                            under="ignore",
                        )
                        if verbose < 2
                        else nullcontext()
                    ):
                        error, parameters = self.parameter_estimator.estimate_parameters(expr)

                    if verbose > 0:
                        if parameters.size > 0:
                            parameter_string = (
                                f" Best parameters found are [{', '.join([str(round(p, 3)) for p in parameters])}]"
                            )
                        else:
                            parameter_string = ""
                        logger.debug("Evaluated expression %s with RMSE: %s.%s", expr_str, error, parameter_string)

                except Exception as e:
                    if verbose > 0:
                        logger.debug("Error evaluating expression %s: %s", expr_str, e)

                    self.invalid.append(expr_str)
                    error, parameters = np.nan, np.array([])

                self.models[expr_str] = ModelResult(
                    expr=expr_list,
                    error=error,
                    parameters=parameters,
                )

                if self._callbacks is not None:
                    is_new_best = error < self._current_best_error
                    if is_new_best:
                        self._current_best_error = error
                    event = ExprEvaluated(
                        expression=expr_str,
                        error=error,
                        evaluation_number=self.total_evaluations,
                        experiment_id=self._experiment_id,
                        is_new_best=is_new_best,
                    )
                    if not self._callbacks.on_expr_evaluated(event):
                        self.should_stop = True
                    if is_new_best:
                        best_event = BestExpressionFound(
                            experiment_id=self._experiment_id,
                            expression=expr_str,
                            error=error,
                            evaluation_number=self.total_evaluations,
                        )
                        if not self._callbacks.on_best_expression(best_event):
                            self.should_stop = True

            elif self.ranking_function == "bed":
                try:
                    with (
                        np.errstate(
                            divide="ignore",
                            invalid="ignore",
                            over="ignore",
                            under="ignore",
                        )
                        if verbose < 2
                        else nullcontext()
                    ):
                        assert self.gt_behavior is not None, (
                            "Ground truth must be provided for BED ranking function."
                        )
                        error = bed(
                            expr,
                            self.gt_behavior,
                            self.bed_evaluation_parameters["bed_X"],
                            num_consts_sampled=self.bed_evaluation_parameters["num_consts_sampled"],
                            num_points_sampled=self.bed_evaluation_parameters["num_points_sampled"],
                            domain_bounds=self.bed_evaluation_parameters["domain_bounds"],
                            consts_bounds=self.bed_evaluation_parameters["constant_bounds"],
                            symbol_library=self.symbol_library,
                            seed=self.seed,
                        )

                        if verbose > 0:
                            logger.debug("Evaluated expression %s with BED: %s.", expr_str, error)

                except Exception as e:
                    if verbose > 0:
                        logger.debug("Error evaluating expression %s: %s", expr_str, e)

                    self.invalid.append(expr_str)
                    error = np.nan

                self.models[expr_str] = ModelResult(
                    expr=expr_list,
                    error=error,
                )

                if self._callbacks is not None:
                    is_new_best = error < self._current_best_error
                    if is_new_best:
                        self._current_best_error = error
                    event = ExprEvaluated(
                        expression=expr_str,
                        error=error,
                        evaluation_number=self.total_evaluations,
                        experiment_id=self._experiment_id,
                        is_new_best=is_new_best,
                    )
                    if not self._callbacks.on_expr_evaluated(event):
                        self.should_stop = True
                    if is_new_best:
                        best_event = BestExpressionFound(
                            experiment_id=self._experiment_id,
                            expression=expr_str,
                            error=error,
                            evaluation_number=self.total_evaluations,
                        )
                        if not self._callbacks.on_best_expression(best_event):
                            self.should_stop = True

            else:
                raise ValueError(f"Ranking function {self.ranking_function} not supported.")

            return error

get_results

get_results(approach_name: str = '', top_k: int = 20, results: Optional[SR_results] = None) -> SR_results

Returns the results of the equation discovery/symbolic regression process/evaluation.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> print(results[0].num_evaluated)
1
>>> print(results[0].evaluation_calls)
1
>>> print(results[0].best_expr)
C*X_1-X_0
>>> print(results[0].min_error < 1e-6)
True
>>> print(1.99 < results[0].top_models[0].parameters[0] < 2.01)
True

Parameters:

Name Type Description Default
approach_name str

The name of the approach used to discover the equations.

''
top_k int

The number of top results to include in the output. If top_k is greater than the number of evaluated expressions, all evaluated expressions are included. If top_k is less than 0, all evaluated expressions are included.

20
results Optional[SR_results]

An SR_results object containing the results of the previous evaluation. If provided, the results of the current evaluation are appended to the existing results. Otherwise, a new SR_results object is created.

None

Returns:

Type Description
SR_results

An instance of the SR_results object with the results of the evaluation.

Source code in SRToolkit/evaluation/sr_evaluator.py
def get_results(
    self, approach_name: str = "", top_k: int = 20, results: Optional["SR_results"] = None
) -> "SR_results":
    """
    Returns the results of the equation discovery/symbolic regression process/evaluation.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> print(results[0].num_evaluated)
        1
        >>> print(results[0].evaluation_calls)
        1
        >>> print(results[0].best_expr)
        C*X_1-X_0
        >>> print(results[0].min_error < 1e-6)
        True
        >>> print(1.99 < results[0].top_models[0].parameters[0] < 2.01)
        True

    Args:
        approach_name: The name of the approach used to discover the equations.
        top_k: The number of top results to include in the output. If `top_k`
            is greater than the number of evaluated expressions, all
            evaluated expressions are included. If `top_k` is less than 0,
            all evaluated expressions are included.
        results: An SR_results object containing the results of the previous evaluation. If provided,
            the results of the current evaluation are appended to the existing results. Otherwise, a new SR_results
            object is created.

    Returns:
        An instance of the SR_results object with the results of the evaluation.
    """
    if top_k > len(self.models) or top_k < 0:
        top_k = len(self.models)

    if results is None:
        results = SR_results()

    results.add_results(
        self.models,
        top_k,
        self.total_evaluations,
        self.success_threshold,
        approach_name,
        self.metadata,
    )

    return results

to_dict

to_dict(base_path: str, name: str) -> dict

Creates a dictionary representation of the SR_evaluator.

Parameters:

Name Type Description Default
base_path str

Used to save the data of the evaluator to disk.

required
name str

Used to save the data of the evaluator to disk.

required

Returns:

Type Description
dict

A dictionary containing the necessary information to recreate the evaluator from disk.

Source code in SRToolkit/evaluation/sr_evaluator.py
def to_dict(self, base_path: str, name: str) -> dict:
    """
    Creates a dictionary representation of the SR_evaluator.

    Args:
        base_path: Used to save the data of the evaluator to disk.
        name: Used to save the data of the evaluator to disk.

    Returns:
        A dictionary containing the necessary information to recreate the evaluator from disk.
    """
    output = {
        "format_version": 1,
        "type": "SR_evaluator",
        "metadata": self.metadata,
        "symbol_library": self.symbol_library.to_dict(),
        "max_evaluations": self.max_evaluations,
        "success_threshold": self.success_threshold,
        "ranking_function": self.ranking_function,
        "seed": self.seed,
        "kwargs": self.kwargs,
    }

    if not os.path.isdir(base_path):
        os.makedirs(base_path)

    X_path = f"{base_path}/{name}_X.npy"
    np.save(X_path, self.X)
    output["X"] = X_path

    if self.y is not None:
        y_path = f"{base_path}/{name}_y.npy"
        np.save(y_path, self.y)
        output["y"] = y_path
    else:
        output["y"] = None

    if self.ground_truth is None:
        output["ground_truth"] = None
    else:
        if isinstance(self.ground_truth, list):
            output["ground_truth"] = self.ground_truth
        elif isinstance(self.ground_truth, Node):
            output["ground_truth"] = self.ground_truth.to_list(self.symbol_library)
        else:
            gt_path = f"{base_path}/{name}_gt.npy"
            np.save(gt_path, self.ground_truth)
            output["ground_truth"] = gt_path

    return output

from_dict staticmethod

from_dict(data: dict) -> SR_evaluator

Reconstruct an SR_evaluator from a dictionary produced by to_dict.

Parameters:

Name Type Description Default
data dict

Dictionary representation of the evaluator, as produced by to_dict.

required

Returns:

Type Description
SR_evaluator

The reconstructed SR_evaluator.

Raises:

Type Description
ValueError

If data["format_version"] is not 1 or if the numpy arrays for X, y, or ground_truth cannot be loaded from disk.

Source code in SRToolkit/evaluation/sr_evaluator.py
@staticmethod
def from_dict(data: dict) -> "SR_evaluator":
    """
    Reconstruct an [SR_evaluator][SRToolkit.evaluation.sr_evaluator.SR_evaluator] from a
    dictionary produced by [to_dict][SRToolkit.evaluation.sr_evaluator.SR_evaluator.to_dict].

    Args:
        data: Dictionary representation of the evaluator, as produced by
            [to_dict][SRToolkit.evaluation.sr_evaluator.SR_evaluator.to_dict].

    Returns:
        The reconstructed [SR_evaluator][SRToolkit.evaluation.sr_evaluator.SR_evaluator].

    Raises:
        ValueError: If ``data["format_version"]`` is not ``1`` or if the numpy arrays
            for ``X``, ``y``, or ``ground_truth`` cannot be loaded from disk.
    """
    if data.get("format_version", 1) != 1:
        raise ValueError(
            f"[SR_evaluator.from_dict] Unsupported format_version: {data.get('format_version')!r}. Expected 1."
        )

    try:
        X = np.load(data["X"])

        if data["y"] is not None:
            y = np.load(data["y"])
        else:
            y = None

        if data["ground_truth"] is None:
            gt = None
        else:
            if isinstance(data["ground_truth"], list):
                gt = data["ground_truth"]
            else:
                gt = np.load(data["ground_truth"])
    except Exception as e:
        raise ValueError(f"[SR_evaluator.from_dict] Unable to load data for X/y/ground truth due to {e}")

    symbol_library = SymbolLibrary.from_dict(data["symbol_library"])
    return SR_evaluator(
        X,
        y=y,
        ground_truth=gt,
        symbol_library=symbol_library,
        max_evaluations=data["max_evaluations"],
        success_threshold=data["success_threshold"],
        ranking_function=data["ranking_function"],
        seed=data["seed"],
        metadata=data["metadata"],
        **data["kwargs"],
    )

SR_results

SR_results()

Container for SR experiment results, typically obtained via SR_evaluator.get_results.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> _ = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> print(results[0].best_expr)
C*X_1-X_0
>>> print(results[0].min_error < 1e-6)
True
>>> len(results)
1

Attributes:

Name Type Description
results

List of EvalResult instances, one per experiment.

Source code in SRToolkit/evaluation/sr_evaluator.py
def __init__(self):
    """
    Container for SR experiment results, typically obtained via
    [SR_evaluator.get_results][SRToolkit.evaluation.sr_evaluator.SR_evaluator.get_results].

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> _ = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> print(results[0].best_expr)
        C*X_1-X_0
        >>> print(results[0].min_error < 1e-6)
        True
        >>> len(results)
        1

    Attributes:
        results: List of [EvalResult][SRToolkit.utils.types.EvalResult] instances,
            one per experiment.
    """
    self.results = list()

add_results

add_results(models: Dict[str, ModelResult], top_k: int, total_evaluations: int, success_threshold: Optional[float], approach_name: str, metadata: Optional[dict] = None) -> None

Adds the results of an evaluation to the results object.

Parameters:

Name Type Description Default
models Dict[str, ModelResult]

A dictionary mapping expressions to their evaluation results.

required
top_k int

The number of top results to include in the output.

required
total_evaluations int

The total number of evaluations performed during the evaluation.

required
success_threshold Optional[float]

The success threshold used to determine whether the evaluation was successful.

required
approach_name str

The name of the approach used to discover the equations.

required
metadata Optional[dict]

A dictionary containing additional metadata about the evaluation.

None
Source code in SRToolkit/evaluation/sr_evaluator.py
def add_results(
    self,
    models: Dict[str, ModelResult],
    top_k: int,
    total_evaluations: int,
    success_threshold: Optional[float],
    approach_name: str,
    metadata: Optional[dict] = None,
) -> None:
    """
    Adds the results of an evaluation to the results object.

    Args:
        models: A dictionary mapping expressions to their evaluation results.
        top_k: The number of top results to include in the output.
        total_evaluations: The total number of evaluations performed during the evaluation.
        success_threshold: The success threshold used to determine whether the evaluation was successful.
        approach_name: The name of the approach used to discover the equations.
        metadata: A dictionary containing additional metadata about the evaluation.
    """
    models_list = list(models.values())
    sorted_indices = np.argsort([v.error for v in models_list])
    sorted_models = [models_list[i] for i in sorted_indices]

    dataset_name = None
    remaining_metadata = None
    if metadata is not None and "dataset_name" in metadata:
        dataset_name = metadata["dataset_name"]
        remaining_metadata = {key: value for key, value in metadata.items() if key != "dataset_name"}
        if len(remaining_metadata) == 0:
            remaining_metadata = None
    elif metadata is not None:
        remaining_metadata = metadata

    success = success_threshold is not None and sorted_models[0].error < success_threshold

    results_obj = EvalResult(
        min_error=sorted_models[0].error,
        best_expr="".join(sorted_models[0].expr),
        num_evaluated=len(models_list),
        evaluation_calls=total_evaluations,
        top_models=sorted_models[:top_k],
        all_models=models_list,
        approach_name=approach_name,
        success=success,
        dataset_name=dataset_name,
        metadata=remaining_metadata,
    )

    self.results.append(results_obj)

print_results

print_results(experiment_number: Optional[int] = None, detailed: bool = False, model_scope: Literal['best', 'top', 'all'] = 'top', augmentations: Optional[List[str]] = None)

Prints the results of the SR_evaluator.

Displays the minimum error, best expression, evaluation counts, success status, metadata, and approach name. When detailed is True, also prints per-model information. Augmentation data is formatted by the corresponding ResultAugmenter subclass, looked up from the global registry via the _type field stored in each augmentation entry.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> results.print_results()
=== Experiment 1/1 ===
Best expression: C*X_1-X_0
Error: ...
Evaluated: 1 expressions | Calls: 1 | Success: ...

>>> results.print_results(detailed=True, experiment_number=0)
Best expression: C*X_1-X_0
Error: ...
Evaluated: 1 expressions | Calls: 1 | Success: ...

Models:
  C*X_1-X_0  (error=..., params=...)

Parameters:

Name Type Description Default
experiment_number Optional[int]

Number of the experiment to print. If None, prints all.

None
detailed bool

If True, prints per-model information.

False
model_scope Literal['best', 'top', 'all']

Which models to show when detailed is True. "best" shows only the top model, "top" shows the top-k, "all" shows all evaluated models.

'top'
augmentations Optional[List[str]]

Filter which augmenters to display by name. If None, all augmentations present in the data are shown.

None
Source code in SRToolkit/evaluation/sr_evaluator.py
def print_results(
    self,
    experiment_number: Optional[int] = None,
    detailed: bool = False,
    model_scope: Literal["best", "top", "all"] = "top",
    augmentations: Optional[List[str]] = None,
):
    r"""
    Prints the results of the SR_evaluator.

    Displays the minimum error, best expression, evaluation counts, success status,
    metadata, and approach name. When *detailed* is ``True``, also prints per-model
    information. Augmentation data is formatted by the corresponding
    [ResultAugmenter][SRToolkit.evaluation.sr_evaluator.ResultAugmenter] subclass,
    looked up from the global registry via the ``_type`` field stored in each
    augmentation entry.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> results.print_results()  # doctest: +ELLIPSIS
        === Experiment 1/1 ===
        Best expression: C*X_1-X_0
        Error: ...
        Evaluated: 1 expressions | Calls: 1 | Success: ...
        <BLANKLINE>
        >>> results.print_results(detailed=True, experiment_number=0)  # doctest: +ELLIPSIS
        Best expression: C*X_1-X_0
        Error: ...
        Evaluated: 1 expressions | Calls: 1 | Success: ...
        <BLANKLINE>
        Models:
          C*X_1-X_0  (error=..., params=...)
        <BLANKLINE>

    Args:
        experiment_number: Number of the experiment to print. If None, prints all.
        detailed: If True, prints per-model information.
        model_scope: Which models to show when *detailed* is True.
            ``"best"`` shows only the top model, ``"top"`` shows the top-k,
            ``"all"`` shows all evaluated models.
        augmentations: Filter which augmenters to display by name.
            If None, all augmentations present in the data are shown.
    """
    if experiment_number is None:
        for i, result in enumerate(self.results):
            print(f"=== Experiment {i + 1}/{len(self.results)} ===")
            SR_results._print_result_(result, detailed, model_scope, augmentations)
            print()
    else:
        assert experiment_number < len(self.results), "[SR_results.print_results] experiment number out of bounds"
        SR_results._print_result_(self.results[experiment_number], detailed, model_scope, augmentations)

augment

augment(augmenters: Union[List[ResultAugmenter], ResultAugmenter], experiment_number: Optional[int] = None) -> None

Applies the given ResultAugmenter instances to the stored results. Augmenters add post-hoc information such as LaTeX representations, simplified expressions, or R² scores.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> from SRToolkit.evaluation.result_augmentation import ExpressionToLatex
>>> results.augment([ExpressionToLatex(SymbolLibrary.default_symbols(2))])
>>> results[0].augmentations["ExpressionToLatex"]["best_expr_latex"]
'$C_{0} \\cdot X_{1} - X_{0}$'

Parameters:

Name Type Description Default
augmenters Union[List[ResultAugmenter], ResultAugmenter]

A ResultAugmenter or a list of ResultAugmenter objects to apply to the results.

required
experiment_number Optional[int]

If provided, apply augmenters only to this experiment's result. If None, apply to all results.

None
Source code in SRToolkit/evaluation/sr_evaluator.py
def augment(
    self, augmenters: Union[List[ResultAugmenter], ResultAugmenter], experiment_number: Optional[int] = None
) -> None:
    r"""
    Applies the given [ResultAugmenter][SRToolkit.evaluation.sr_evaluator.ResultAugmenter]
    instances to the stored results. Augmenters add post-hoc information such as LaTeX
    representations, simplified expressions, or R² scores.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> from SRToolkit.evaluation.result_augmentation import ExpressionToLatex
        >>> results.augment([ExpressionToLatex(SymbolLibrary.default_symbols(2))])
        >>> results[0].augmentations["ExpressionToLatex"]["best_expr_latex"]  # doctest: +ELLIPSIS
        '$C_{0} \\cdot X_{1} - X_{0}$'

    Args:
        augmenters: A [ResultAugmenter][SRToolkit.evaluation.sr_evaluator.ResultAugmenter] or a list of [ResultAugmenter][SRToolkit.evaluation.sr_evaluator.ResultAugmenter] objects to apply to the results.
        experiment_number: If provided, apply augmenters only to this experiment's result.
            If None, apply to all results.
    """
    if isinstance(augmenters, ResultAugmenter):
        augmenters = [augmenters]

    if experiment_number is not None:
        assert experiment_number < len(self.results), "[SR_results.augment] experiment number out of bounds"
        for augmenter in augmenters:
            try:
                augmenter.write_results(self.results[experiment_number])
            except Exception as e:
                warnings.warn(f"Error augmenting results with {augmenter.name}, skipping: {e}")
    else:
        for result in self.results:
            for augmenter in augmenters:
                try:
                    augmenter.write_results(result)
                except Exception as e:
                    warnings.warn(f"Error augmenting results with {augmenter.name}, skipping: {e}")

__add__

__add__(other: SR_results) -> SR_results

Returns a new SR_results object that is the concatenation of the current SR_results object with the other SR_results object.

Parameters:

Name Type Description Default
other SR_results

SR_results object to concatenate with the current SR_results object.

required

Returns:

Type Description
SR_results

A new SR_results object containing the concatenated results.

Source code in SRToolkit/evaluation/sr_evaluator.py
def __add__(self, other: "SR_results") -> "SR_results":
    """
    Returns a new SR_results object that is the concatenation of the current SR_results object with the other SR_results object.

    Args:
        other: SR_results object to concatenate with the current SR_results object.

    Returns:
        A new SR_results object containing the concatenated results.
    """
    new = SR_results()
    new.results = self.results + other.results
    return new

__iadd__

__iadd__(other: SR_results) -> SR_results

In-place concatenation of SR_results objects.

Parameters:

Name Type Description Default
other SR_results

SR_results object to concatenate with the current SR_results object.

required

Returns:

Type Description
SR_results

self

Source code in SRToolkit/evaluation/sr_evaluator.py
def __iadd__(self, other: "SR_results") -> "SR_results":
    """
    In-place concatenation of SR_results objects.

    Args:
        other: SR_results object to concatenate with the current SR_results object.

    Returns:
        self
    """
    self.results += other.results
    return self

__getitem__

__getitem__(item: int) -> EvalResult

Returns the results of the experiment with the given index.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> result_of_first_experiment = results[0]

Parameters:

Name Type Description Default
item int

the index of the experiment.

required

Returns:

Type Description
EvalResult

The results of the experiment with the given index.

Source code in SRToolkit/evaluation/sr_evaluator.py
def __getitem__(self, item: int) -> EvalResult:
    """
    Returns the results of the experiment with the given index.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> result_of_first_experiment = results[0]

    Args:
        item: the index of the experiment.

    Returns:
        The results of the experiment with the given index.

    """
    assert isinstance(item, int), "[SR_Results.__getitem__] Item must be an integer."
    assert 0 <= item < len(self.results), "[SR_Results.__getitem__] Item out of bounds."
    return self.results[item]

__len__

__len__() -> int

Returns the number of results stored in the results object. Usually, each result corresponds to a single experiment.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> len(results)
1

Returns:

Type Description
int

The number of results stored in the results object.

Source code in SRToolkit/evaluation/sr_evaluator.py
def __len__(self) -> int:
    """
    Returns the number of results stored in the results object. Usually, each result corresponds to a single experiment.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> len(results)
        1

    Returns:
        The number of results stored in the results object.
    """
    return len(self.results)

save

save(path: str) -> None

Saves the results to a specific file or directory as JSON.

If path is an existing directory, it writes results.json inside it. If path is a file path, it must end with the .json extension.

Examples:

>>> import tempfile
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> _ = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     results.save(tmpdir + "/my_results/results.json")
...     loaded = SR_results.load(tmpdir + "/my_results/results.json")
...     print(loaded[0].best_expr)
C*X_1-X_0

Parameters:

Name Type Description Default
path str

Directory path or specific .json file path.

required

Raises:

Type Description
ValueError

If the path is a file with an extension other than .json.

OSError

If the directory cannot be created.

Source code in SRToolkit/evaluation/sr_evaluator.py
def save(self, path: str) -> None:
    """
    Saves the results to a specific file or directory as JSON.

    If *path* is an existing directory, it writes ``results.json`` inside it.
    If *path* is a file path, it must end with the ``.json`` extension.

    Examples:
        >>> import tempfile
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> _ = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> with tempfile.TemporaryDirectory() as tmpdir:
        ...     results.save(tmpdir + "/my_results/results.json")
        ...     loaded = SR_results.load(tmpdir + "/my_results/results.json")
        ...     print(loaded[0].best_expr)
        C*X_1-X_0

    Args:
        path: Directory path or specific .json file path.

    Raises:
        ValueError: If the path is a file with an extension other than .json.
        OSError: If the directory cannot be created.
    """
    if os.path.isdir(path):
        target_file = os.path.join(path, "results.json")
    else:
        _, extension = os.path.splitext(path)

        if extension == "":
            target_file = os.path.join(path, "results.json")
        elif extension.lower() == ".json":
            target_file = path
        else:
            raise ValueError(f"Invalid file extension '{extension}'. Results must be saved to a '.json' file.")

    target_dir = os.path.dirname(target_file)
    if target_dir and not os.path.isdir(target_dir):
        os.makedirs(target_dir)

    # 3. Prepare data
    output = {
        "format_version": 1,
        "type": "SR_results",
        "results": [r.to_dict() for r in self.results],
    }

    with open(target_file, "w") as f:
        json.dump(output, f, indent=2)

load staticmethod

load(path: str) -> SR_results

Load results previously saved with save.

If path is a directory, it looks for results.json inside it. If path is a file, it must end with the .json extension.

Parameters:

Name Type Description Default
path str

Directory path containing results.json or path to a specific .json file.

required

Returns:

Type Description
SR_results

A new SR_results instance with the loaded data.

Raises:

Type Description
FileNotFoundError

If the specified file or directory does not exist.

ValueError

If the file extension is not .json or if format_version is not 1.

Source code in SRToolkit/evaluation/sr_evaluator.py
@staticmethod
def load(path: str) -> "SR_results":
    """
    Load results previously saved with [save][SRToolkit.evaluation.sr_evaluator.SR_results.save].

    If *path* is a directory, it looks for ``results.json`` inside it.
    If *path* is a file, it must end with the ``.json`` extension.

    Args:
        path: Directory path containing ``results.json`` or path to a specific .json file.

    Returns:
        A new [SR_results][SRToolkit.evaluation.sr_evaluator.SR_results] instance with the loaded data.

    Raises:
        FileNotFoundError: If the specified file or directory does not exist.
        ValueError: If the file extension is not .json or if ``format_version`` is not ``1``.
    """
    if os.path.isdir(path):
        results_path = os.path.join(path, "results.json")
    else:
        _, extension = os.path.splitext(path)
        if extension.lower() != ".json":
            raise ValueError(
                f"Invalid file extension '{extension}'. SR_results can only be loaded from '.json' files."
            )
        results_path = path

    if not os.path.exists(results_path):
        raise FileNotFoundError(f"Could not find results file at: {results_path}")

    with open(results_path, "r") as f:
        data = json.load(f)

    if data.get("format_version", 1) != 1:
        raise ValueError(
            f"[SR_results.load] Unsupported format_version: {data.get('format_version')!r}. Expected 1."
        )

    sr_results = SR_results()
    sr_results.results = [EvalResult.from_dict(r) for r in data["results"]]
    return sr_results