SR Evaluator

SRToolkit.evaluation.sr_evaluator

Expression evaluation and results management for symbolic regression.

Contains ResultAugmenter — the base class for post-processing results, SR_evaluator — the core evaluator that ranks expressions by RMSE or BED, and SR_results — a container for experiment results.

Note

ResultAugmenter is defined here rather than in result_augmentation to avoid circular imports.

ResultAugmenter

ResultAugmenter(name: str)

Bases: ABC

Base class for result augmenters. Subclasses implement write_results to compute and store additional data in an EvalResult via add_augmentation.

For concrete implementations, see result_augmentation.

Parameters:

Name	Type	Description	Default
`name`	`str`	Identifier used as the key in `augmentations` dict of EvalResult and ModelResult. If two augmenters share the same name, add_augmentation appends a numeric suffix automatically.	required

Source code in SRToolkit/evaluation/sr_evaluator.py

def __init__(self, name: str):
    """
    Base class for result augmenters. Subclasses implement
    [write_results][SRToolkit.evaluation.sr_evaluator.ResultAugmenter.write_results] to compute
    and store additional data in an [EvalResult][SRToolkit.utils.types.EvalResult] via
    [add_augmentation][SRToolkit.utils.types.EvalResult.add_augmentation].

    For concrete implementations, see
    [result_augmentation][SRToolkit.evaluation.result_augmentation].

    Args:
        name: Identifier used as the key in
            ``augmentations`` dict of [EvalResult][SRToolkit.utils.types.EvalResult] and
            [ModelResult][SRToolkit.utils.types.ModelResult].
            If two augmenters share the same name,
            [add_augmentation][SRToolkit.utils.types.EvalResult.add_augmentation] appends a
            numeric suffix automatically.
    """
    self.name = name

write_results `abstractmethod`

write_results(results: EvalResult) -> None

Compute and write augmentation data into results and its models.

Call results.add_augmentation(self.name, data, self._type) for experiment-level data and model.add_augmentation(self.name, data, self._type) for per-model data.

Parameters:

Name	Type	Description	Default
`results`	`EvalResult`	The EvalResult to augment.	required

Source code in SRToolkit/evaluation/sr_evaluator.py

@abstractmethod
def write_results(
    self,
    results: "EvalResult",
) -> None:
    """
    Compute and write augmentation data into *results* and its models.

    Call ``results.add_augmentation(self.name, data, self._type)`` for experiment-level
    data and ``model.add_augmentation(self.name, data, self._type)`` for per-model data.

    Args:
        results: The [EvalResult][SRToolkit.utils.types.EvalResult] to augment.
    """

to_dict `abstractmethod`

to_dict(base_path: str, name: str) -> dict

Transforms the augmenter into a dictionary. This is used for saving the augmenter to disk.

Parameters:

Name	Type	Description	Default
`base_path`	`str`	The base path used for saving the data inside the augmenter, if needed.	required
`name`	`str`	The name/identifier used by the augmenter for saving.	required

Returns:

Type	Description
`dict`	A dictionary containing the necessary information to recreate the augmenter.

Source code in SRToolkit/evaluation/sr_evaluator.py

@abstractmethod
def to_dict(self, base_path: str, name: str) -> dict:
    """
    Transforms the augmenter into a dictionary. This is used for saving the augmenter to disk.

    Args:
        base_path: The base path used for saving the data inside the augmenter, if needed.
        name: The name/identifier used by the augmenter for saving.

    Returns:
        A dictionary containing the necessary information to recreate the augmenter.
    """

format_eval_result `classmethod`

format_eval_result(data: Dict[str, Any]) -> str

Returns a formatted string for experiment-level augmentation data.

Subclasses override this for custom formatting. The data dict is the inner augmentation dictionary (includes _type).

Parameters:

Name	Type	Description	Default
`data`	`Dict[str, Any]`	The augmentation data dictionary.	required

Returns:

Type	Description
`str`	A formatted string, or empty string if no relevant data exists.

Source code in SRToolkit/evaluation/sr_evaluator.py

@classmethod
def format_eval_result(cls, data: Dict[str, Any]) -> str:
    """
    Returns a formatted string for experiment-level augmentation data.

    Subclasses override this for custom formatting. The *data* dict is the inner
    augmentation dictionary (includes ``_type``).

    Args:
        data: The augmentation data dictionary.

    Returns:
        A formatted string, or empty string if no relevant data exists.
    """
    return "\n".join(f"  {k}: {v}" for k, v in data.items() if k != "_type")

format_model_result `classmethod`

format_model_result(data: Dict[str, Any]) -> str

Returns a formatted string for a single model's augmentation data.

Subclasses override this for custom formatting. The data dict is the inner augmentation dictionary (includes _type).

Parameters:

Name	Type	Description	Default
`data`	`Dict[str, Any]`	The augmentation data dictionary.	required

Returns:

Type	Description
`str`	A formatted string, or empty string if no relevant data exists.

Source code in SRToolkit/evaluation/sr_evaluator.py

@classmethod
def format_model_result(cls, data: Dict[str, Any]) -> str:
    """
    Returns a formatted string for a single model's augmentation data.

    Subclasses override this for custom formatting. The *data* dict is the inner
    augmentation dictionary (includes ``_type``).

    Args:
        data: The augmentation data dictionary.

    Returns:
        A formatted string, or empty string if no relevant data exists.
    """
    parts = [f"{k}={v}" for k, v in data.items() if k != "_type"]
    return ", ".join(parts)

from_dict `staticmethod`

from_dict(data: dict) -> ResultAugmenter

Creates an instance of the ResultAugmenter class from the dictionary with the relevant data.

Subclasses should override this method if they support serialization. The default implementation raises NotImplementedError, allowing custom augmenters to skip serialization if not needed.

Parameters:

Name	Type	Description	Default
`data`	`dict`	the dictionary containing the data needed to recreate the augmenter.	required

Returns:

Type	Description
`ResultAugmenter`	An instance of the ResultAugmenter class with the same configuration as in the data dictionary.

Raises:

Type	Description
`NotImplementedError`	If the subclass does not implement this method.

Source code in SRToolkit/evaluation/sr_evaluator.py

@staticmethod
def from_dict(data: dict) -> "ResultAugmenter":
    """
    Creates an instance of the ResultAugmenter class from the dictionary with the relevant data.

    Subclasses should override this method if they support serialization. The default
    implementation raises ``NotImplementedError``, allowing custom augmenters to skip
    serialization if not needed.

    Args:
        data: the dictionary containing the data needed to recreate the augmenter.

    Returns:
        An instance of the ResultAugmenter class with the same configuration as in the data dictionary.

    Raises:
        NotImplementedError: If the subclass does not implement this method.
    """
    raise NotImplementedError(
        "from_dict is not implemented for this augmenter. "
        "Override this method if your augmenter supports serialization."
    )

SR_evaluator

SR_evaluator(X: ndarray, y: Optional[ndarray] = None, symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(), max_evaluations: int = -1, success_threshold: Optional[float] = None, ranking_function: str = 'rmse', ground_truth: Optional[Union[List[str], Node, ndarray]] = None, seed: Optional[int] = None, metadata: Optional[dict] = None, **kwargs: Unpack[EstimationSettings])

Evaluates symbolic regression expressions and ranks them by RMSE or Behavioral Expression Distance (BED).

Previously evaluated expressions are cached so repeated calls with the same expression are free. Results are collected via get_results.

Note

Determining whether two expressions are semantically equivalent is undecidable. Random sampling, parameter fitting, and numerical errors all make the success_threshold only a proxy for success — we recommend inspecting the best expression manually.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Input data of shape `(n_samples, n_features)`.	required
`y`	`Optional[ndarray]`	Target values of shape `(n_samples,)`. Required when `ranking_function="rmse"`.	`None`
`symbol_library`	`SymbolLibrary`	Symbol library defining the token vocabulary. Defaults to SymbolLibrary.default_symbols.	`default_symbols()`
`max_evaluations`	`int`	Maximum number of expressions to evaluate. `-1` means no limit. Default `-1`.	`-1`
`success_threshold`	`Optional[float]`	Error value below which an expression is considered successful. If `None`, defaults to `1e-7` for RMSE and is auto-calculated for BED by evaluating the ground truth against itself 100 times and taking `max(distances) * 1.1`. If less than 0, no threshold is used.	`None`
`ranking_function`	`str`	`"rmse"` or `"bed"`. Default `"rmse"`.	`'rmse'`
`ground_truth`	`Optional[Union[List[str], Node, ndarray]]`	Required when `ranking_function="bed"`. The target expression as a token list, a Node tree, or a pre-computed behavior matrix (see create_behavior_matrix).	`None`
`seed`	`Optional[int]`	Random seed for reproducible sampling. Default `None`.	`None`
`metadata`	`Optional[dict]`	Optional dict with information about this evaluation (e.g. dataset name, seed). If a `"dataset_name"` key is present it is extracted into EvalResult `dataset_name`.	`None`
`**kwargs`	`Unpack[EstimationSettings]`	Optional settings from EstimationSettings. Supported keys: `method`, `tol`, `gtol`, `max_iter`, `constant_bounds`, `initialization`, `max_constants`, `max_expr_length`, `num_points_sampled`, `bed_X`, `num_consts_sampled`, `domain_bounds`.	`{}`

Attributes:

Name	Type	Description
`models`		Cached ModelResult for every evaluated expression, keyed by the concatenated token string.
`invalid`		Token strings of expressions that raised an exception during evaluation.
`ground_truth`		The target expression passed at construction (BED mode).
`gt_behavior`		Pre-computed behavior matrix for the ground truth (BED mode).
`max_evaluations`		Maximum number of expressions to evaluate.
`bed_evaluation_parameters`		Active BED evaluation settings dict.
`metadata`		Metadata dict passed at construction.
`symbol_library`		The symbol library used.
`total_evaluations`		Number of times evaluate_expr has been called, including cache hits.
`seed`		Random seed.
`parameter_estimator`		ParameterEstimator instance used in RMSE mode.
`ranking_function`		Active ranking function (`"rmse"` or `"bed"`).
`success_threshold`		Error threshold for determining success.

Source code in SRToolkit/evaluation/sr_evaluator.py

def __init__(
    self,
    X: np.ndarray,
    y: Optional[np.ndarray] = None,
    symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
    max_evaluations: int = -1,
    success_threshold: Optional[float] = None,
    ranking_function: str = "rmse",
    ground_truth: Optional[Union[List[str], Node, np.ndarray]] = None,
    seed: Optional[int] = None,
    metadata: Optional[dict] = None,
    **kwargs: Unpack[EstimationSettings],
):
    """
    Evaluates symbolic regression expressions and ranks them by RMSE or Behavioral Expression Distance (BED).

    Previously evaluated expressions are cached so repeated calls with the same expression
    are free. Results are collected via
    [get_results][SRToolkit.evaluation.sr_evaluator.SR_evaluator.get_results].

    Note:
        Determining whether two expressions are semantically equivalent is undecidable.
        Random sampling, parameter fitting, and numerical errors all make the
        ``success_threshold`` only a proxy for success — we recommend inspecting the best
        expression manually.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> print(rmse < 1e-6)
        True

    Args:
        X: Input data of shape ``(n_samples, n_features)``.
        y: Target values of shape ``(n_samples,)``. Required when ``ranking_function="rmse"``.
        symbol_library: Symbol library defining the token vocabulary.
            Defaults to [SymbolLibrary.default_symbols][SRToolkit.utils.symbol_library.SymbolLibrary.default_symbols].
        max_evaluations: Maximum number of expressions to evaluate. ``-1`` means no limit.
            Default ``-1``.
        success_threshold: Error value below which an expression is considered successful.
            If ``None``, defaults to ``1e-7`` for RMSE and is auto-calculated for BED by
            evaluating the ground truth against itself 100 times and taking
            ``max(distances) * 1.1``. If less than 0, no threshold is used.
        ranking_function: ``"rmse"`` or ``"bed"``. Default ``"rmse"``.
        ground_truth: Required when ``ranking_function="bed"``. The target expression as a
            token list, a [Node][SRToolkit.utils.expression_tree.Node] tree, or a pre-computed
            behavior matrix (see
            [create_behavior_matrix][SRToolkit.utils.measures.create_behavior_matrix]).
        seed: Random seed for reproducible sampling. Default ``None``.
        metadata: Optional dict with information about this evaluation (e.g. dataset name,
            seed). If a ``"dataset_name"`` key is present it is extracted into
            [EvalResult][SRToolkit.utils.types.EvalResult] ``dataset_name``.
        **kwargs: Optional settings from
            [EstimationSettings][SRToolkit.utils.types.EstimationSettings].
            Supported keys: ``method``, ``tol``, ``gtol``, ``max_iter``,
            ``constant_bounds``, ``initialization``, ``max_constants``,
            ``max_expr_length``, ``num_points_sampled``, ``bed_X``,
            ``num_consts_sampled``, ``domain_bounds``.

    Attributes:
        models: Cached [ModelResult][SRToolkit.utils.types.ModelResult] for every evaluated expression,
            keyed by the concatenated token string.
        invalid: Token strings of expressions that raised an exception during evaluation.
        ground_truth: The target expression passed at construction (BED mode).
        gt_behavior: Pre-computed behavior matrix for the ground truth (BED mode).
        max_evaluations: Maximum number of expressions to evaluate.
        bed_evaluation_parameters: Active BED evaluation settings dict.
        metadata: Metadata dict passed at construction.
        symbol_library: The symbol library used.
        total_evaluations: Number of times
            [evaluate_expr][SRToolkit.evaluation.sr_evaluator.SR_evaluator.evaluate_expr]
            has been called, including cache hits.
        seed: Random seed.
        parameter_estimator: [ParameterEstimator][SRToolkit.evaluation.parameter_estimator.ParameterEstimator]
            instance used in RMSE mode.
        ranking_function: Active ranking function (``"rmse"`` or ``"bed"``).
        success_threshold: Error threshold for determining success.
    """
    self.kwargs = kwargs
    self.models: Dict[str, ModelResult] = dict()
    self.invalid: List[str] = list()
    self.success_threshold = success_threshold
    self.metadata = metadata
    self.ground_truth = ground_truth
    self.gt_behavior = None
    self._callbacks: Optional[Union[CallbackDispatcher, SRCallbacks]] = None
    self._experiment_id: str = ""
    self.should_stop = False
    self._current_best_error = float("inf")
    self.bed_evaluation_parameters: Dict[str, Any] = {
        "bed_X": None,
        "num_consts_sampled": 32,
        "num_points_sampled": 64,
        "domain_bounds": None,
        "constant_bounds": (-5, 5),
    }
    if kwargs:
        for k in self.bed_evaluation_parameters.keys():
            if k in kwargs:
                self.bed_evaluation_parameters[k] = kwargs[k]  # type: ignore[literal-required]
    if self.bed_evaluation_parameters["num_points_sampled"] == -1:
        self.bed_evaluation_parameters["num_points_sampled"] = X.shape[0]

    self.symbol_library = symbol_library
    self.max_evaluations = max_evaluations
    self.total_evaluations = 0
    self.seed = seed
    if seed is not None:
        np.random.seed(seed)

    if ranking_function not in ["rmse", "bed"]:
        warnings.warn(f"ranking_function '{ranking_function}' not supported. Using rmse instead.")
        ranking_function = "rmse"
    self.ranking_function = ranking_function

    if ranking_function == "rmse":
        if y is None:
            raise ValueError("Target values must be provided for RMSE ranking function.")
        self.parameter_estimator = ParameterEstimator(X, y, symbol_library=symbol_library, seed=seed, **kwargs)

        if self.success_threshold is None:
            self.success_threshold = 1e-7

    elif ranking_function == "bed":
        if ground_truth is None:
            raise ValueError(
                "Ground truth must be provided for bed ranking function. The ground truth must be "
                "provided as a list of tokens, a Node object, or a numpy array representing behavior. "
                "The behavior matrix is a matrix representing the distribution of outputs of an "
                "expression with free parameters at different points in the domain. This matrix "
                "should be of size (num_points_sampled, num_consts_sampled). See "
                "SRToolkit.utils.create_behavior_matrix for more details."
            )
        else:
            if self.bed_evaluation_parameters["bed_X"] is None:
                if self.bed_evaluation_parameters["domain_bounds"] is not None:
                    db = self.bed_evaluation_parameters["domain_bounds"]
                    assert isinstance(db, List), "Domain bounds should be a list of tuples."
                    interval_length = np.array([ub - lb for (lb, ub) in db])
                    lower_bound = np.array([lb for (lb, ub) in db])
                    lho = LatinHypercube(len(db), optimization="random-cd", seed=seed)
                    self.bed_evaluation_parameters["bed_X"] = (
                        lho.random(self.bed_evaluation_parameters["num_points_sampled"]) * interval_length
                        + lower_bound
                    )
                else:
                    indices = np.random.choice(
                        X.shape[0],
                        size=self.bed_evaluation_parameters["num_points_sampled"],
                    )
                    self.bed_evaluation_parameters["bed_X"] = X[indices, :]

        if isinstance(ground_truth, (list, Node)):
            self.gt_behavior = create_behavior_matrix(
                ground_truth,
                self.bed_evaluation_parameters["bed_X"],
                num_consts_sampled=self.bed_evaluation_parameters["num_consts_sampled"],
                consts_bounds=self.bed_evaluation_parameters["constant_bounds"],
                symbol_library=self.symbol_library,
                seed=self.seed,
            )
        elif isinstance(ground_truth, np.ndarray):
            self.gt_behavior = ground_truth
        else:
            raise ValueError(
                "Ground truth must be provided as a list of tokens, a Node object, or a numpy array representing behavior."
            )

        if self.success_threshold is None:
            assert self.ground_truth is not None, "Ground truth must be provided for BED ranking function."
            distances = [
                bed(
                    self.ground_truth,
                    self.gt_behavior,
                    self.bed_evaluation_parameters["bed_X"],
                    num_consts_sampled=self.bed_evaluation_parameters["num_consts_sampled"],
                    num_points_sampled=self.bed_evaluation_parameters["num_points_sampled"],
                    domain_bounds=self.bed_evaluation_parameters["domain_bounds"],
                    consts_bounds=self.bed_evaluation_parameters["constant_bounds"],
                    symbol_library=self.symbol_library,
                )
                for i in range(100)
            ]
            self.success_threshold = np.max(distances) * 1.1

    self._callbacks = CallbackDispatcher(
        callbacks=[EarlyStoppingCallback(threshold=self.success_threshold, max_evaluations=max_evaluations)]
    )
    self.X = X
    self.y = y

set_callbacks

set_callbacks(callbacks: Optional[Union[SRCallbacks, CallbackDispatcher]] = None) -> None

Register callbacks for monitoring and early stopping.

A single SRCallbacks instance is automatically wrapped in a CallbackDispatcher.

Examples:

>>> from SRToolkit.evaluation.callbacks import EarlyStoppingCallback
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> se.set_callbacks(EarlyStoppingCallback(threshold=1e-6))
>>> se._callbacks is not None
True

Parameters:

Name	Type	Description	Default
`callbacks`	`Optional[Union[SRCallbacks, CallbackDispatcher]]`	A CallbackDispatcher or a single SRCallbacks instance.	`None`

Source code in SRToolkit/evaluation/sr_evaluator.py

def set_callbacks(self, callbacks: Optional[Union[SRCallbacks, CallbackDispatcher]] = None) -> None:
    """
    Register callbacks for monitoring and early stopping.

    A single [SRCallbacks][SRToolkit.evaluation.callbacks.SRCallbacks] instance is
    automatically wrapped in a
    [CallbackDispatcher][SRToolkit.evaluation.callbacks.CallbackDispatcher].

    Examples:
        >>> from SRToolkit.evaluation.callbacks import EarlyStoppingCallback
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> se.set_callbacks(EarlyStoppingCallback(threshold=1e-6))
        >>> se._callbacks is not None
        True

    Args:
        callbacks: A [CallbackDispatcher][SRToolkit.evaluation.callbacks.CallbackDispatcher]
            or a single [SRCallbacks][SRToolkit.evaluation.callbacks.SRCallbacks] instance.
    """
    if isinstance(callbacks, CallbackDispatcher):
        if self._callbacks is not None:
            if isinstance(self._callbacks, SRCallbacks):
                old_callbacks = [self._callbacks]
            if isinstance(self._callbacks, CallbackDispatcher):
                old_callbacks = self._callbacks.get_callbacks()
            else:
                old_callbacks = []

        self._callbacks = callbacks
        for cb in old_callbacks:
            self._callbacks.add(cb)
    elif isinstance(callbacks, SRCallbacks):
        if isinstance(self._callbacks, CallbackDispatcher):
            self._callbacks.add(callbacks)
        elif isinstance(self._callbacks, SRCallbacks):
            self._callbacks = CallbackDispatcher(callbacks=[self._callbacks, callbacks])
        else:
            self._callbacks = CallbackDispatcher(callbacks=[callbacks])

evaluate_expr

evaluate_expr(expr: Union[List[str], Node], simplify_expr: bool = False, verbose: int = 0) -> float

Evaluates an expression in infix notation and stores the result in memory to prevent re-evaluation.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
>>> X = np.array([[0, 1], [0, 2], [0, 3]])
>>> y = np.array([2, 3, 4])
>>> se = SR_evaluator(X, y, seed=42, success_threshold=-1)
>>> rmse = se.evaluate_expr(["C", "+", "C", "*", "C", "+", "X_0", "*", "X_1", "/", "X_0"], simplify_expr=True)
>>> print(rmse < 1e-6)
True
>>> list(se.models.keys())[0]
'C+X_1'
>>> print(0.99 < se.models["C+X_1"].parameters[0] < 1.01)
True
>>> # Evaluating invalid expression returns nan and adds it to invalid list
>>> print(se.evaluate_expr(["C", "*", "X_1", "X_0"]))
nan
>>> se.invalid
['C*X_1X_0']
>>> X = np.random.rand(10, 2) - 0.5
>>> gt = ["X_0", "+", "C"]
>>> se = SR_evaluator(X, ground_truth=gt, ranking_function="bed", seed=42)
>>> print(se.evaluate_expr(["C", "+", "X_1"]) < 1)
True
>>> # When evaluating using BED as the ranking function, the error depends on the scale of output of the
>>> # ground truth. Because of stochasticity of BED, error might be high even when expressions match exactly.
>>> print(se.evaluate_expr(["C", "+", "X_0"]) < 0.2)
True
>>> # X can also be sampled from a domain by providing domain_bounds
>>> se = SR_evaluator(X, ground_truth=gt, ranking_function="bed", domain_bounds=[(-1, 1), (-1, 1)], seed=42)
>>> print(se.evaluate_expr(["C", "+", "X_0"]) < 0.2)
True

Parameters:

Name	Type	Description	Default
`expr`	`Union[List[str], Node]`	Expression as a token list in infix notation or a Node tree.	required
`simplify_expr`	`bool`	If `True`, simplifies the expression with SymPy before evaluating. Slows down evaluation; recommended only for post-hoc inspection of top results. Default `False`.	`False`
`verbose`	`int`	`0` — silent; `1` — logs expression, error, and fitted parameters; `2` — also surfaces NumPy warnings during evaluation. Default `0`.	`0`

Returns:

Type	Description
`float`	The error of the expression under the active ranking function: RMSE when `ranking_function="rmse"`, BED when `ranking_function="bed"`. Returns `NaN` if the expression is invalid or `max_evaluations` has been reached (a warning is emitted in the latter case). If the expression was already evaluated, the cached value is returned immediately.

Source code in SRToolkit/evaluation/sr_evaluator.py

def evaluate_expr(
    self,
    expr: Union[List[str], Node],
    simplify_expr: bool = False,
    verbose: int = 0,
) -> float:
    """
    Evaluates an expression in infix notation and stores the result in
    memory to prevent re-evaluation.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> print(rmse < 1e-6)
        True
        >>> X = np.array([[0, 1], [0, 2], [0, 3]])
        >>> y = np.array([2, 3, 4])
        >>> se = SR_evaluator(X, y, seed=42, success_threshold=-1)
        >>> rmse = se.evaluate_expr(["C", "+", "C", "*", "C", "+", "X_0", "*", "X_1", "/", "X_0"], simplify_expr=True)
        >>> print(rmse < 1e-6)
        True
        >>> list(se.models.keys())[0]
        'C+X_1'
        >>> print(0.99 < se.models["C+X_1"].parameters[0] < 1.01)
        True
        >>> # Evaluating invalid expression returns nan and adds it to invalid list
        >>> print(se.evaluate_expr(["C", "*", "X_1", "X_0"]))
        nan
        >>> se.invalid
        ['C*X_1X_0']
        >>> X = np.random.rand(10, 2) - 0.5
        >>> gt = ["X_0", "+", "C"]
        >>> se = SR_evaluator(X, ground_truth=gt, ranking_function="bed", seed=42)
        >>> print(se.evaluate_expr(["C", "+", "X_1"]) < 1)
        True
        >>> # When evaluating using BED as the ranking function, the error depends on the scale of output of the
        >>> # ground truth. Because of stochasticity of BED, error might be high even when expressions match exactly.
        >>> print(se.evaluate_expr(["C", "+", "X_0"]) < 0.2)
        True
        >>> # X can also be sampled from a domain by providing domain_bounds
        >>> se = SR_evaluator(X, ground_truth=gt, ranking_function="bed", domain_bounds=[(-1, 1), (-1, 1)], seed=42)
        >>> print(se.evaluate_expr(["C", "+", "X_0"]) < 0.2)
        True

    Args:
        expr: Expression as a token list in infix notation or a
            [Node][SRToolkit.utils.expression_tree.Node] tree.
        simplify_expr: If ``True``, simplifies the expression with SymPy before evaluating.
            Slows down evaluation; recommended only for post-hoc inspection of top results.
            Default ``False``.
        verbose: ``0`` — silent; ``1`` — logs expression, error, and fitted parameters;
            ``2`` — also surfaces NumPy warnings during evaluation. Default ``0``.

    Returns:
        The error of the expression under the active ranking function: RMSE when ``ranking_function="rmse"``, BED when ``ranking_function="bed"``. Returns ``NaN`` if the expression is invalid or ``max_evaluations`` has been reached (a warning is emitted in the latter case). If the expression was already evaluated, the cached value is returned immediately.
    """
    self.total_evaluations += 1

    if self.should_stop:
        warnings.warn(
            f"Evaluation stopped because max_evaluations ({self.max_evaluations}) reached or an expression with error lower than success_threshold ({self.success_threshold}) was found. "
        )
        return np.nan
    else:
        if simplify_expr:
            try:
                expr = simplify(expr, self.symbol_library)
            except Exception as e:
                if isinstance(expr, Node):
                    expr_list = expr.to_list(symbol_library=self.symbol_library)
                else:
                    expr_list = expr
                warnings.warn(f"Unable to simplify: {''.join(expr_list)}, problems with subexpression {e}")

        if isinstance(expr, Node):
            expr_list = expr.to_list(symbol_library=self.symbol_library)
        else:
            expr_list = expr

        expr_str = "".join(expr_list)
        if expr_str in self.models:
            if verbose > 0:
                logger.debug("Already evaluated %s", expr_str)
            if self._callbacks is not None:
                event = ExprEvaluated(
                    expression=expr_str,
                    error=self.models[expr_str].error,
                    evaluation_number=self.total_evaluations,
                    experiment_id=self._experiment_id,
                    is_new_best=False,
                )
                if not self._callbacks.on_expr_evaluated(event):
                    self.should_stop = True
            return self.models[expr_str].error

        else:
            if self.ranking_function == "rmse":
                try:
                    with (
                        np.errstate(
                            divide="ignore",
                            invalid="ignore",
                            over="ignore",
                            under="ignore",
                        )
                        if verbose < 2
                        else nullcontext()
                    ):
                        error, parameters = self.parameter_estimator.estimate_parameters(expr)

                    if verbose > 0:
                        if parameters.size > 0:
                            parameter_string = (
                                f" Best parameters found are [{', '.join([str(round(p, 3)) for p in parameters])}]"
                            )
                        else:
                            parameter_string = ""
                        logger.debug("Evaluated expression %s with RMSE: %s.%s", expr_str, error, parameter_string)

                except Exception as e:
                    if verbose > 0:
                        logger.debug("Error evaluating expression %s: %s", expr_str, e)

                    self.invalid.append(expr_str)
                    error, parameters = np.nan, np.array([])

                self.models[expr_str] = ModelResult(
                    expr=expr_list,
                    error=error,
                    parameters=parameters,
                )

                if self._callbacks is not None:
                    is_new_best = error < self._current_best_error
                    if is_new_best:
                        self._current_best_error = error
                    event = ExprEvaluated(
                        expression=expr_str,
                        error=error,
                        evaluation_number=self.total_evaluations,
                        experiment_id=self._experiment_id,
                        is_new_best=is_new_best,
                    )
                    if not self._callbacks.on_expr_evaluated(event):
                        self.should_stop = True
                    if is_new_best:
                        best_event = BestExpressionFound(
                            experiment_id=self._experiment_id,
                            expression=expr_str,
                            error=error,
                            evaluation_number=self.total_evaluations,
                        )
                        if not self._callbacks.on_best_expression(best_event):
                            self.should_stop = True

            elif self.ranking_function == "bed":
                try:
                    with (
                        np.errstate(
                            divide="ignore",
                            invalid="ignore",
                            over="ignore",
                            under="ignore",
                        )
                        if verbose < 2
                        else nullcontext()
                    ):
                        assert self.gt_behavior is not None, (
                            "Ground truth must be provided for BED ranking function."
                        )
                        error = bed(
                            expr,
                            self.gt_behavior,
                            self.bed_evaluation_parameters["bed_X"],
                            num_consts_sampled=self.bed_evaluation_parameters["num_consts_sampled"],
                            num_points_sampled=self.bed_evaluation_parameters["num_points_sampled"],
                            domain_bounds=self.bed_evaluation_parameters["domain_bounds"],
                            consts_bounds=self.bed_evaluation_parameters["constant_bounds"],
                            symbol_library=self.symbol_library,
                            seed=self.seed,
                        )

                        if verbose > 0:
                            logger.debug("Evaluated expression %s with BED: %s.", expr_str, error)

                except Exception as e:
                    if verbose > 0:
                        logger.debug("Error evaluating expression %s: %s", expr_str, e)

                    self.invalid.append(expr_str)
                    error = np.nan

                self.models[expr_str] = ModelResult(
                    expr=expr_list,
                    error=error,
                )

                if self._callbacks is not None:
                    is_new_best = error < self._current_best_error
                    if is_new_best:
                        self._current_best_error = error
                    event = ExprEvaluated(
                        expression=expr_str,
                        error=error,
                        evaluation_number=self.total_evaluations,
                        experiment_id=self._experiment_id,
                        is_new_best=is_new_best,
                    )
                    if not self._callbacks.on_expr_evaluated(event):
                        self.should_stop = True
                    if is_new_best:
                        best_event = BestExpressionFound(
                            experiment_id=self._experiment_id,
                            expression=expr_str,
                            error=error,
                            evaluation_number=self.total_evaluations,
                        )
                        if not self._callbacks.on_best_expression(best_event):
                            self.should_stop = True

            else:
                raise ValueError(f"Ranking function {self.ranking_function} not supported.")

            return error

get_results

get_results(approach_name: str = '', top_k: int = 20, results: Optional[SR_results] = None) -> SR_results

Returns the results of the equation discovery/symbolic regression process/evaluation.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> print(results[0].num_evaluated)
1
>>> print(results[0].evaluation_calls)
1
>>> print(results[0].best_expr)
C*X_1-X_0
>>> print(results[0].min_error < 1e-6)
True
>>> print(1.99 < results[0].top_models[0].parameters[0] < 2.01)
True

Parameters:

Name	Type	Description	Default
`approach_name`	`str`	The name of the approach used to discover the equations.	`''`
`top_k`	`int`	The number of top results to include in the output. If `top_k` is greater than the number of evaluated expressions, all evaluated expressions are included. If `top_k` is less than 0, all evaluated expressions are included.	`20`
`results`	`Optional[SR_results]`	An SR_results object containing the results of the previous evaluation. If provided, the results of the current evaluation are appended to the existing results. Otherwise, a new SR_results object is created.	`None`

Returns:

Type	Description
`SR_results`	An instance of the SR_results object with the results of the evaluation.

Source code in SRToolkit/evaluation/sr_evaluator.py

def get_results(
    self, approach_name: str = "", top_k: int = 20, results: Optional["SR_results"] = None
) -> "SR_results":
    """
    Returns the results of the equation discovery/symbolic regression process/evaluation.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> print(results[0].num_evaluated)
        1
        >>> print(results[0].evaluation_calls)
        1
        >>> print(results[0].best_expr)
        C*X_1-X_0
        >>> print(results[0].min_error < 1e-6)
        True
        >>> print(1.99 < results[0].top_models[0].parameters[0] < 2.01)
        True

    Args:
        approach_name: The name of the approach used to discover the equations.
        top_k: The number of top results to include in the output. If `top_k`
            is greater than the number of evaluated expressions, all
            evaluated expressions are included. If `top_k` is less than 0,
            all evaluated expressions are included.
        results: An SR_results object containing the results of the previous evaluation. If provided,
            the results of the current evaluation are appended to the existing results. Otherwise, a new SR_results
            object is created.

    Returns:
        An instance of the SR_results object with the results of the evaluation.
    """
    if top_k > len(self.models) or top_k < 0:
        top_k = len(self.models)

    if results is None:
        results = SR_results()

    results.add_results(
        self.models,
        top_k,
        self.total_evaluations,
        self.success_threshold,
        approach_name,
        self.metadata,
    )

    return results

to_dict

to_dict(base_path: str, name: str) -> dict

Creates a dictionary representation of the SR_evaluator.

Parameters:

Name	Type	Description	Default
`base_path`	`str`	Used to save the data of the evaluator to disk.	required
`name`	`str`	Used to save the data of the evaluator to disk.	required

Returns:

Type	Description
`dict`	A dictionary containing the necessary information to recreate the evaluator from disk.

Source code in SRToolkit/evaluation/sr_evaluator.py

def to_dict(self, base_path: str, name: str) -> dict:
    """
    Creates a dictionary representation of the SR_evaluator.

    Args:
        base_path: Used to save the data of the evaluator to disk.
        name: Used to save the data of the evaluator to disk.

    Returns:
        A dictionary containing the necessary information to recreate the evaluator from disk.
    """
    output = {
        "format_version": 1,
        "type": "SR_evaluator",
        "metadata": self.metadata,
        "symbol_library": self.symbol_library.to_dict(),
        "max_evaluations": self.max_evaluations,
        "success_threshold": self.success_threshold,
        "ranking_function": self.ranking_function,
        "seed": self.seed,
        "kwargs": self.kwargs,
    }

    if not os.path.isdir(base_path):
        os.makedirs(base_path)

    X_path = f"{base_path}/{name}_X.npy"
    np.save(X_path, self.X)
    output["X"] = X_path

    if self.y is not None:
        y_path = f"{base_path}/{name}_y.npy"
        np.save(y_path, self.y)
        output["y"] = y_path
    else:
        output["y"] = None

    if self.ground_truth is None:
        output["ground_truth"] = None
    else:
        if isinstance(self.ground_truth, list):
            output["ground_truth"] = self.ground_truth
        elif isinstance(self.ground_truth, Node):
            output["ground_truth"] = self.ground_truth.to_list(self.symbol_library)
        else:
            gt_path = f"{base_path}/{name}_gt.npy"
            np.save(gt_path, self.ground_truth)
            output["ground_truth"] = gt_path

    return output

from_dict `staticmethod`

from_dict(data: dict) -> SR_evaluator

Reconstruct an SR_evaluator from a dictionary produced by to_dict.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Dictionary representation of the evaluator, as produced by to_dict.	required

Returns:

Type	Description
`SR_evaluator`	The reconstructed SR_evaluator.

Raises:

Type	Description
`ValueError`	If `data["format_version"]` is not `1` or if the numpy arrays for `X`, `y`, or `ground_truth` cannot be loaded from disk.

Source code in SRToolkit/evaluation/sr_evaluator.py

@staticmethod
def from_dict(data: dict) -> "SR_evaluator":
    """
    Reconstruct an [SR_evaluator][SRToolkit.evaluation.sr_evaluator.SR_evaluator] from a
    dictionary produced by [to_dict][SRToolkit.evaluation.sr_evaluator.SR_evaluator.to_dict].

    Args:
        data: Dictionary representation of the evaluator, as produced by
            [to_dict][SRToolkit.evaluation.sr_evaluator.SR_evaluator.to_dict].

    Returns:
        The reconstructed [SR_evaluator][SRToolkit.evaluation.sr_evaluator.SR_evaluator].

    Raises:
        ValueError: If ``data["format_version"]`` is not ``1`` or if the numpy arrays
            for ``X``, ``y``, or ``ground_truth`` cannot be loaded from disk.
    """
    if data.get("format_version", 1) != 1:
        raise ValueError(
            f"[SR_evaluator.from_dict] Unsupported format_version: {data.get('format_version')!r}. Expected 1."
        )

    try:
        X = np.load(data["X"])

        if data["y"] is not None:
            y = np.load(data["y"])
        else:
            y = None

        if data["ground_truth"] is None:
            gt = None
        else:
            if isinstance(data["ground_truth"], list):
                gt = data["ground_truth"]
            else:
                gt = np.load(data["ground_truth"])
    except Exception as e:
        raise ValueError(f"[SR_evaluator.from_dict] Unable to load data for X/y/ground truth due to {e}")

    symbol_library = SymbolLibrary.from_dict(data["symbol_library"])
    return SR_evaluator(
        X,
        y=y,
        ground_truth=gt,
        symbol_library=symbol_library,
        max_evaluations=data["max_evaluations"],
        success_threshold=data["success_threshold"],
        ranking_function=data["ranking_function"],
        seed=data["seed"],
        metadata=data["metadata"],
        **data["kwargs"],
    )

SR_results

SR_results()

Container for SR experiment results, typically obtained via SR_evaluator.get_results.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> _ = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> print(results[0].best_expr)
C*X_1-X_0
>>> print(results[0].min_error < 1e-6)
True
>>> len(results)
1

Attributes:

Name	Type	Description
`results`		List of EvalResult instances, one per experiment.

Source code in SRToolkit/evaluation/sr_evaluator.py

def __init__(self):
    """
    Container for SR experiment results, typically obtained via
    [SR_evaluator.get_results][SRToolkit.evaluation.sr_evaluator.SR_evaluator.get_results].

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9]])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> _ = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> print(results[0].best_expr)
        C*X_1-X_0
        >>> print(results[0].min_error < 1e-6)
        True
        >>> len(results)
        1

    Attributes:
        results: List of [EvalResult][SRToolkit.utils.types.EvalResult] instances,
            one per experiment.
    """
    self.results = list()

add_results

add_results(models: Dict[str, ModelResult], top_k: int, total_evaluations: int, success_threshold: Optional[float], approach_name: str, metadata: Optional[dict] = None) -> None

Adds the results of an evaluation to the results object.

Parameters:

Name	Type	Description	Default
`models`	`Dict[str, ModelResult]`	A dictionary mapping expressions to their evaluation results.	required
`top_k`	`int`	The number of top results to include in the output.	required
`total_evaluations`	`int`	The total number of evaluations performed during the evaluation.	required
`success_threshold`	`Optional[float]`	The success threshold used to determine whether the evaluation was successful.	required
`approach_name`	`str`	The name of the approach used to discover the equations.	required
`metadata`	`Optional[dict]`	A dictionary containing additional metadata about the evaluation.	`None`

Source code in SRToolkit/evaluation/sr_evaluator.py

def add_results(
    self,
    models: Dict[str, ModelResult],
    top_k: int,
    total_evaluations: int,
    success_threshold: Optional[float],
    approach_name: str,
    metadata: Optional[dict] = None,
) -> None:
    """
    Adds the results of an evaluation to the results object.

    Args:
        models: A dictionary mapping expressions to their evaluation results.
        top_k: The number of top results to include in the output.
        total_evaluations: The total number of evaluations performed during the evaluation.
        success_threshold: The success threshold used to determine whether the evaluation was successful.
        approach_name: The name of the approach used to discover the equations.
        metadata: A dictionary containing additional metadata about the evaluation.
    """
    models_list = list(models.values())
    sorted_indices = np.argsort([v.error for v in models_list])
    sorted_models = [models_list[i] for i in sorted_indices]

    dataset_name = None
    remaining_metadata = None
    if metadata is not None and "dataset_name" in metadata:
        dataset_name = metadata["dataset_name"]
        remaining_metadata = {key: value for key, value in metadata.items() if key != "dataset_name"}
        if len(remaining_metadata) == 0:
            remaining_metadata = None
    elif metadata is not None:
        remaining_metadata = metadata

    success = success_threshold is not None and sorted_models[0].error < success_threshold

    results_obj = EvalResult(
        min_error=sorted_models[0].error,
        best_expr="".join(sorted_models[0].expr),
        num_evaluated=len(models_list),
        evaluation_calls=total_evaluations,
        top_models=sorted_models[:top_k],
        all_models=models_list,
        approach_name=approach_name,
        success=success,
        dataset_name=dataset_name,
        metadata=remaining_metadata,
    )

    self.results.append(results_obj)

print_results

print_results(experiment_number: Optional[int] = None, detailed: bool = False, model_scope: Literal['best', 'top', 'all'] = 'top', augmentations: Optional[List[str]] = None)

Prints the results of the SR_evaluator.

Displays the minimum error, best expression, evaluation counts, success status, metadata, and approach name. When detailed is True, also prints per-model information. Augmentation data is formatted by the corresponding ResultAugmenter subclass, looked up from the global registry via the _type field stored in each augmentation entry.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> results.print_results()
=== Experiment 1/1 ===
Best expression: C*X_1-X_0
Error: ...
Evaluated: 1 expressions | Calls: 1 | Success: ...

>>> results.print_results(detailed=True, experiment_number=0)
Best expression: C*X_1-X_0
Error: ...
Evaluated: 1 expressions | Calls: 1 | Success: ...

Models:
  C*X_1-X_0  (error=..., params=...)

Parameters:

Name	Type	Description	Default
`experiment_number`	`Optional[int]`	Number of the experiment to print. If None, prints all.	`None`
`detailed`	`bool`	If True, prints per-model information.	`False`
`model_scope`	`Literal['best', 'top', 'all']`	Which models to show when detailed is True. `"best"` shows only the top model, `"top"` shows the top-k, `"all"` shows all evaluated models.	`'top'`
`augmentations`	`Optional[List[str]]`	Filter which augmenters to display by name. If None, all augmentations present in the data are shown.	`None`

Source code in SRToolkit/evaluation/sr_evaluator.py

def print_results(
    self,
    experiment_number: Optional[int] = None,
    detailed: bool = False,
    model_scope: Literal["best", "top", "all"] = "top",
    augmentations: Optional[List[str]] = None,
):
    r"""
    Prints the results of the SR_evaluator.

    Displays the minimum error, best expression, evaluation counts, success status,
    metadata, and approach name. When *detailed* is ``True``, also prints per-model
    information. Augmentation data is formatted by the corresponding
    [ResultAugmenter][SRToolkit.evaluation.sr_evaluator.ResultAugmenter] subclass,
    looked up from the global registry via the ``_type`` field stored in each
    augmentation entry.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> results.print_results()  # doctest: +ELLIPSIS
        === Experiment 1/1 ===
        Best expression: C*X_1-X_0
        Error: ...
        Evaluated: 1 expressions | Calls: 1 | Success: ...
        <BLANKLINE>
        >>> results.print_results(detailed=True, experiment_number=0)  # doctest: +ELLIPSIS
        Best expression: C*X_1-X_0
        Error: ...
        Evaluated: 1 expressions | Calls: 1 | Success: ...
        <BLANKLINE>
        Models:
          C*X_1-X_0  (error=..., params=...)
        <BLANKLINE>

    Args:
        experiment_number: Number of the experiment to print. If None, prints all.
        detailed: If True, prints per-model information.
        model_scope: Which models to show when *detailed* is True.
            ``"best"`` shows only the top model, ``"top"`` shows the top-k,
            ``"all"`` shows all evaluated models.
        augmentations: Filter which augmenters to display by name.
            If None, all augmentations present in the data are shown.
    """
    if experiment_number is None:
        for i, result in enumerate(self.results):
            print(f"=== Experiment {i + 1}/{len(self.results)} ===")
            SR_results._print_result_(result, detailed, model_scope, augmentations)
            print()
    else:
        assert experiment_number < len(self.results), "[SR_results.print_results] experiment number out of bounds"
        SR_results._print_result_(self.results[experiment_number], detailed, model_scope, augmentations)

augment

augment(augmenters: Union[List[ResultAugmenter], ResultAugmenter], experiment_number: Optional[int] = None) -> None

Applies the given ResultAugmenter instances to the stored results. Augmenters add post-hoc information such as LaTeX representations, simplified expressions, or R² scores.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> from SRToolkit.evaluation.result_augmentation import ExpressionToLatex
>>> results.augment([ExpressionToLatex(SymbolLibrary.default_symbols(2))])
>>> results[0].augmentations["ExpressionToLatex"]["best_expr_latex"]
'$C_{0} \\cdot X_{1} - X_{0}$'

Parameters:

Name	Type	Description	Default
`augmenters`	`Union[List[ResultAugmenter], ResultAugmenter]`	A ResultAugmenter or a list of ResultAugmenter objects to apply to the results.	required
`experiment_number`	`Optional[int]`	If provided, apply augmenters only to this experiment's result. If None, apply to all results.	`None`

Source code in SRToolkit/evaluation/sr_evaluator.py

def augment(
    self, augmenters: Union[List[ResultAugmenter], ResultAugmenter], experiment_number: Optional[int] = None
) -> None:
    r"""
    Applies the given [ResultAugmenter][SRToolkit.evaluation.sr_evaluator.ResultAugmenter]
    instances to the stored results. Augmenters add post-hoc information such as LaTeX
    representations, simplified expressions, or R² scores.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> from SRToolkit.evaluation.result_augmentation import ExpressionToLatex
        >>> results.augment([ExpressionToLatex(SymbolLibrary.default_symbols(2))])
        >>> results[0].augmentations["ExpressionToLatex"]["best_expr_latex"]  # doctest: +ELLIPSIS
        '$C_{0} \\cdot X_{1} - X_{0}$'

    Args:
        augmenters: A [ResultAugmenter][SRToolkit.evaluation.sr_evaluator.ResultAugmenter] or a list of [ResultAugmenter][SRToolkit.evaluation.sr_evaluator.ResultAugmenter] objects to apply to the results.
        experiment_number: If provided, apply augmenters only to this experiment's result.
            If None, apply to all results.
    """
    if isinstance(augmenters, ResultAugmenter):
        augmenters = [augmenters]

    if experiment_number is not None:
        assert experiment_number < len(self.results), "[SR_results.augment] experiment number out of bounds"
        for augmenter in augmenters:
            try:
                augmenter.write_results(self.results[experiment_number])
            except Exception as e:
                warnings.warn(f"Error augmenting results with {augmenter.name}, skipping: {e}")
    else:
        for result in self.results:
            for augmenter in augmenters:
                try:
                    augmenter.write_results(result)
                except Exception as e:
                    warnings.warn(f"Error augmenting results with {augmenter.name}, skipping: {e}")

add

__add__(other: SR_results) -> SR_results

Returns a new SR_results object that is the concatenation of the current SR_results object with the other SR_results object.

Parameters:

Name	Type	Description	Default
`other`	`SR_results`	SR_results object to concatenate with the current SR_results object.	required

Returns:

Type	Description
`SR_results`	A new SR_results object containing the concatenated results.

Source code in SRToolkit/evaluation/sr_evaluator.py

def __add__(self, other: "SR_results") -> "SR_results":
    """
    Returns a new SR_results object that is the concatenation of the current SR_results object with the other SR_results object.

    Args:
        other: SR_results object to concatenate with the current SR_results object.

    Returns:
        A new SR_results object containing the concatenated results.
    """
    new = SR_results()
    new.results = self.results + other.results
    return new

iadd

__iadd__(other: SR_results) -> SR_results

In-place concatenation of SR_results objects.

Parameters:

Name	Type	Description	Default
`other`	`SR_results`	SR_results object to concatenate with the current SR_results object.	required

Returns:

Type	Description
`SR_results`	self

Source code in SRToolkit/evaluation/sr_evaluator.py

def __iadd__(self, other: "SR_results") -> "SR_results":
    """
    In-place concatenation of SR_results objects.

    Args:
        other: SR_results object to concatenate with the current SR_results object.

    Returns:
        self
    """
    self.results += other.results
    return self

getitem

__getitem__(item: int) -> EvalResult

Returns the results of the experiment with the given index.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> result_of_first_experiment = results[0]

Parameters:

Name	Type	Description	Default
`item`	`int`	the index of the experiment.	required

Returns:

Type	Description
`EvalResult`	The results of the experiment with the given index.

Source code in SRToolkit/evaluation/sr_evaluator.py

def __getitem__(self, item: int) -> EvalResult:
    """
    Returns the results of the experiment with the given index.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> result_of_first_experiment = results[0]

    Args:
        item: the index of the experiment.

    Returns:
        The results of the experiment with the given index.

    """
    assert isinstance(item, int), "[SR_Results.__getitem__] Item must be an integer."
    assert 0 <= item < len(self.results), "[SR_Results.__getitem__] Item out of bounds."
    return self.results[item]

len

__len__() -> int

Returns the number of results stored in the results object. Usually, each result corresponds to a single experiment.

Examples:

>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> len(results)
1

Returns:

Type	Description
`int`	The number of results stored in the results object.

Source code in SRToolkit/evaluation/sr_evaluator.py

def __len__(self) -> int:
    """
    Returns the number of results stored in the results object. Usually, each result corresponds to a single experiment.

    Examples:
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y)
        >>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> len(results)
        1

    Returns:
        The number of results stored in the results object.
    """
    return len(self.results)

save

save(path: str) -> None

Saves the results to a specific file or directory as JSON.

If path is an existing directory, it writes results.json inside it. If path is a file path, it must end with the .json extension.

Examples:

>>> import tempfile
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y, seed=42)
>>> _ = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     results.save(tmpdir + "/my_results/results.json")
...     loaded = SR_results.load(tmpdir + "/my_results/results.json")
...     print(loaded[0].best_expr)
C*X_1-X_0

Parameters:

Name	Type	Description	Default
`path`	`str`	Directory path or specific .json file path.	required

Raises:

Type	Description
`ValueError`	If the path is a file with an extension other than .json.
`OSError`	If the directory cannot be created.

Source code in SRToolkit/evaluation/sr_evaluator.py

def save(self, path: str) -> None:
    """
    Saves the results to a specific file or directory as JSON.

    If *path* is an existing directory, it writes ``results.json`` inside it.
    If *path* is a file path, it must end with the ``.json`` extension.

    Examples:
        >>> import tempfile
        >>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
        >>> y = np.array([3, 0, 3, 11])
        >>> se = SR_evaluator(X, y, seed=42)
        >>> _ = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
        >>> results = se.get_results(top_k=1)
        >>> with tempfile.TemporaryDirectory() as tmpdir:
        ...     results.save(tmpdir + "/my_results/results.json")
        ...     loaded = SR_results.load(tmpdir + "/my_results/results.json")
        ...     print(loaded[0].best_expr)
        C*X_1-X_0

    Args:
        path: Directory path or specific .json file path.

    Raises:
        ValueError: If the path is a file with an extension other than .json.
        OSError: If the directory cannot be created.
    """
    if os.path.isdir(path):
        target_file = os.path.join(path, "results.json")
    else:
        _, extension = os.path.splitext(path)

        if extension == "":
            target_file = os.path.join(path, "results.json")
        elif extension.lower() == ".json":
            target_file = path
        else:
            raise ValueError(f"Invalid file extension '{extension}'. Results must be saved to a '.json' file.")

    target_dir = os.path.dirname(target_file)
    if target_dir and not os.path.isdir(target_dir):
        os.makedirs(target_dir)

    # 3. Prepare data
    output = {
        "format_version": 1,
        "type": "SR_results",
        "results": [r.to_dict() for r in self.results],
    }

    with open(target_file, "w") as f:
        json.dump(output, f, indent=2)

load `staticmethod`

load(path: str) -> SR_results

Load results previously saved with save.

If path is a directory, it looks for results.json inside it. If path is a file, it must end with the .json extension.

Parameters:

Name	Type	Description	Default
`path`	`str`	Directory path containing `results.json` or path to a specific .json file.	required

Returns:

Type	Description
`SR_results`	A new SR_results instance with the loaded data.

Raises:

Type	Description
`FileNotFoundError`	If the specified file or directory does not exist.
`ValueError`	If the file extension is not .json or if `format_version` is not `1`.

Source code in SRToolkit/evaluation/sr_evaluator.py

@staticmethod
def load(path: str) -> "SR_results":
    """
    Load results previously saved with [save][SRToolkit.evaluation.sr_evaluator.SR_results.save].

    If *path* is a directory, it looks for ``results.json`` inside it.
    If *path* is a file, it must end with the ``.json`` extension.

    Args:
        path: Directory path containing ``results.json`` or path to a specific .json file.

    Returns:
        A new [SR_results][SRToolkit.evaluation.sr_evaluator.SR_results] instance with the loaded data.

    Raises:
        FileNotFoundError: If the specified file or directory does not exist.
        ValueError: If the file extension is not .json or if ``format_version`` is not ``1``.
    """
    if os.path.isdir(path):
        results_path = os.path.join(path, "results.json")
    else:
        _, extension = os.path.splitext(path)
        if extension.lower() != ".json":
            raise ValueError(
                f"Invalid file extension '{extension}'. SR_results can only be loaded from '.json' files."
            )
        results_path = path

    if not os.path.exists(results_path):
        raise FileNotFoundError(f"Could not find results file at: {results_path}")

    with open(results_path, "r") as f:
        data = json.load(f)

    if data.get("format_version", 1) != 1:
        raise ValueError(
            f"[SR_results.load] Unsupported format_version: {data.get('format_version')!r}. Expected 1."
        )

    sr_results = SR_results()
    sr_results.results = [EvalResult.from_dict(r) for r in data["results"]]
    return sr_results

SR Evaluator

SRToolkit.evaluation.sr_evaluator

ResultAugmenter

write_results abstractmethod

to_dict abstractmethod

format_eval_result classmethod

format_model_result classmethod

from_dict staticmethod

SR_evaluator

set_callbacks

evaluate_expr

get_results

to_dict

from_dict staticmethod

SR_results

add_results

print_results

augment

__add__

__iadd__

__getitem__

__len__

save

load staticmethod

write_results `abstractmethod`

to_dict `abstractmethod`

format_eval_result `classmethod`

format_model_result `classmethod`

from_dict `staticmethod`

from_dict `staticmethod`

add

iadd

getitem

len

load `staticmethod`