Evaluation Module
SRToolkit.evaluation
This module contains classes and functions for evaluating symbolic regression approaches. Mainly it contains classes that can be used for parameter estimation and evaluation of mathematical expressions on some dataset.
Modules:
Name | Description |
---|---|
parameter_estimator |
The module containing classes and functions for parameter estimation. |
sr_evaluator |
The module containing classes and functions for expressions on some dataset. |
ParameterEstimator
Source code in SRToolkit/evaluation/parameter_estimator.py
|
|
__init__(X, y, symbol_library=SymbolLibrary.default_symbols(), **kwargs)
Initializes an instance of the ParameterEstimator class.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> pe = ParameterEstimator(X, y)
>>> rmse, constants = pe.estimate_parameters(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
>>> print(1.99 < constants[0] < 2.01)
True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
ndarray
|
The input data to be used in parameter estimation for variables. We assume that X is a 2D array with shape (n_samples, n_features). |
required |
y
|
ndarray
|
The target values to be used in parameter estimation. |
required |
symbol_library
|
SymbolLibrary
|
The symbol library to use. Defaults to SymbolLibrary.default_symbols(). |
default_symbols()
|
Other Parameters:
Name | Type | Description |
---|---|---|
method |
str
|
The method to be used for minimization. Currently, only "L-BFGS-B" is supported/tested. Default is "L-BFGS-B". |
tol |
float
|
The tolerance for termination. Default is 1e-6. |
gtol |
float
|
The tolerance for the gradient norm. Default is 1e-3. |
max_iter |
int
|
The maximum number of iterations. Default is 100. |
bounds |
List[float]
|
A list of two elements, specifying the lower and upper bounds for the constant values. Default is [-5, 5]. |
initialization |
str
|
The method to use for initializing the constant values. Currently, only "random" and "mean" are supported. "random" creates a vector with random values sampled within the bounds. "mean" creates a vector where all values are calculated as (lower_bound + upper_bound)/2. Default is "random". |
max_constants |
int
|
The maximum number of constants allowed in the expression. Default is 8. |
max_expr_length |
int
|
The maximum length of the expression. Default is -1 (no limit). |
Functions:
Name | Description |
---|---|
estimate_parameters |
List[str]): Estimates the parameters of an expression by minimizing the error between the predicted and actual values. |
Source code in SRToolkit/evaluation/parameter_estimator.py
estimate_parameters(expr)
Estimates the parameters of an expression by minimizing the error between the predicted and actual values.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> pe = ParameterEstimator(X, y)
>>> rmse, constants = pe.estimate_parameters(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
>>> print(1.99 < constants[0] < 2.01)
True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr
|
List[str]
|
A list of strings representing the expression to be evaluated. The expression should include the symbol 'C' for constants whose values need to be estimated. |
required |
Returns:
Type | Description |
---|---|
float
|
the root mean square error (RMSE) of the optimized expression. |
ndarray
|
An array containing the optimized constant values. |
Notes
if the length of the expression exceeds the maximum allowed, NaN and an empty array are returned. If the number of constants in the expression exceeds the maximum allowed, NaN and an empty array are returned. If there are no constants in the expression, the RMSE is calculated directly without optimization.
Source code in SRToolkit/evaluation/parameter_estimator.py
SR_evaluator
Source code in SRToolkit/evaluation/sr_evaluator.py
|
|
__init__(X, y, max_evaluations=-1, metadata=None, symbol_library=SymbolLibrary.default_symbols(), **kwargs)
Initializes an instance of the SR_evaluator class. This class is used for evaluating symbolic regression approaches.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
Attributes:
Name | Type | Description |
---|---|---|
models |
A dictionary containing the results of previously evaluated expressions. |
|
max_evaluations |
The maximum number of expressions to evaluate. |
|
metadata |
An optional dictionary containing metadata about this evaluation. This could include information such as the dataset used, the model used, seed, etc. |
|
symbol_library |
The symbol library to use. |
|
total_expressions |
The total number of expressions considered. |
|
parameter_estimator |
An instance of the ParameterEstimator class used for parameter estimation. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
ndarray
|
The input data to be used in parameter estimation for variables. We assume that X is a 2D array with shape (n_samples, n_features). |
required |
y
|
ndarray
|
The target values to be used in parameter estimation. |
required |
max_evaluations
|
int
|
The maximum number of expressions to evaluate. Default is -1, which means no limit. |
-1
|
metadata
|
Optional[dict]
|
An optional dictionary containing metadata about this evaluation. This could include information such as the dataset used, the model used, seed, etc. |
None
|
symbol_library
|
SymbolLibrary
|
The symbol library to use. |
default_symbols()
|
Other Parameters:
Name | Type | Description |
---|---|---|
method |
str
|
The method to be used for minimization. Currently, only "L-BFGS-B" is supported/tested. Default is "L-BFGS-B". |
tol |
float
|
The tolerance for termination. Default is 1e-6. |
gtol |
float
|
The tolerance for the gradient norm. Default is 1e-3. |
max_iter |
int
|
The maximum number of iterations. Default is 100. |
bounds |
List[float]
|
A list of two elements, specifying the lower and upper bounds for the constant values. Default is [-5, 5]. |
initialization |
str
|
The method to use for initializing the constant values. Currently, only "random" and "mean" are supported. "random" creates a vector with random values sampled within the bounds. "mean" creates a vector where all values are calculated as (lower_bound + upper_bound)/2. Default is "random". |
max_constants |
int
|
The maximum number of constants allowed in the expression. Default is 8. |
max_expr_length |
int
|
The maximum length of the expression. Default is -1 (no limit). |
Functions:
Name | Description |
---|---|
evaluate_expr |
Evaluates an expression in infix notation and stores the result in memory to prevent re-evaluation. |
get_results |
Returns the results of the evaluation. |
Source code in SRToolkit/evaluation/sr_evaluator.py
evaluate_expr(expr)
Evaluates an expression in infix notation and stores the result in memory to prevent re-evaluation.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr
|
List[str]
|
A list of strings representing the expression in infix notation. |
required |
Returns:
Type | Description |
---|---|
float
|
The root mean square error of the expression. |
Warns:
Type | Description |
---|---|
Maximum number of evaluations reached
|
If the maximum number of evaluations has been reached, a warning is printed and np.nan is returned. |
Notes
If the expression has already been evaluated, its stored value is returned instead of re-evaluating the expression. When the maximum number of evaluations has been reached, a warning is printed and np.nan is returned.
Source code in SRToolkit/evaluation/sr_evaluator.py
get_results(top_k=20, success_threshold=1e-07)
Returns the results of the equation discovery/symbolic regression process/evaluation.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> print(results["num_evaluated"])
1
>>> print(results["total_expressions"])
1
>>> print(results["best_expr"])
C*X_1-X_0
>>> print(results["min_rmse"] < 1e-6)
True
>>> print(1.99 < results["results"][0]["parameters"][0] < 2.01)
True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
top_k
|
int
|
The number of top results to include in the output. If |
20
|
success_threshold
|
float
|
The threshold below which the evaluation is considered successful. Default is 1e-7. |
1e-07
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing the results of the equation discovery/symbolic regression process. The keys are:
|