Evaluation Module
SRToolkit.evaluation
This module contains classes and functions for evaluating symbolic regression approaches. Mainly it contains classes that can be used for parameter estimation and evaluation of mathematical expressions on some dataset.
Modules:
Name | Description |
---|---|
parameter_estimator |
The module containing classes and functions for parameter estimation. |
sr_evaluator |
The module containing classes and functions for expressions on some dataset. |
ParameterEstimator
Source code in SRToolkit/evaluation/parameter_estimator.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
__init__(X, y, symbol_library=SymbolLibrary.default_symbols(), **kwargs)
Initializes an instance of the ParameterEstimator class.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> pe = ParameterEstimator(X, y)
>>> rmse, constants = pe.estimate_parameters(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
>>> print(1.99 < constants[0] < 2.01)
True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
ndarray
|
The input data to be used in parameter estimation for variables. We assume that X is a 2D array with shape (n_samples, n_features). |
required |
y
|
ndarray
|
The target values to be used in parameter estimation. |
required |
symbol_library
|
SymbolLibrary
|
The symbol library to use. Defaults to SymbolLibrary.default_symbols(). |
default_symbols()
|
Other Parameters:
Name | Type | Description |
---|---|---|
method |
str
|
The method to be used for minimization. Currently, only "L-BFGS-B" is supported/tested. Default is "L-BFGS-B". |
tol |
float
|
The tolerance for termination. Default is 1e-6. |
gtol |
float
|
The tolerance for the gradient norm. Default is 1e-3. |
max_iter |
int
|
The maximum number of iterations. Default is 100. |
bounds |
List[float]
|
A list of two elements, specifying the lower and upper bounds for the constant values. Default is [-5, 5]. |
initialization |
str
|
The method to use for initializing the constant values. Currently, only "random" and "mean" are supported. "random" creates a vector with random values sampled within the bounds. "mean" creates a vector where all values are calculated as (lower_bound + upper_bound)/2. Default is "random". |
max_constants |
int
|
The maximum number of constants allowed in the expression. Default is 8. |
max_expr_length |
int
|
The maximum length of the expression. Default is -1 (no limit). |
Functions:
Name | Description |
---|---|
estimate_parameters |
List[str]): Estimates the parameters of an expression by minimizing the error between the predicted and actual values. |
Source code in SRToolkit/evaluation/parameter_estimator.py
estimate_parameters(expr)
Estimates the parameters of an expression by minimizing the error between the predicted and actual values.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> pe = ParameterEstimator(X, y)
>>> rmse, constants = pe.estimate_parameters(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
>>> print(1.99 < constants[0] < 2.01)
True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr
|
List[str]
|
A list of strings representing the expression to be evaluated. The expression should include the symbol 'C' for constants whose values need to be estimated. |
required |
Returns:
Type | Description |
---|---|
float
|
the root mean square error (RMSE) of the optimized expression. |
ndarray
|
An array containing the optimized constant values. |
Notes
if the length of the expression exceeds the maximum allowed, NaN and an empty array are returned. If the number of constants in the expression exceeds the maximum allowed, NaN and an empty array are returned. If there are no constants in the expression, the RMSE is calculated directly without optimization.
Source code in SRToolkit/evaluation/parameter_estimator.py
SR_evaluator
Source code in SRToolkit/evaluation/sr_evaluator.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
|
__init__(X, y, max_evaluations=-1, metadata=None, symbol_library=SymbolLibrary.default_symbols(), **kwargs)
Initializes an instance of the SR_evaluator class. This class is used for evaluating symbolic regression approaches.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
Attributes:
Name | Type | Description |
---|---|---|
models |
A dictionary containing the results of previously evaluated expressions. |
|
max_evaluations |
The maximum number of expressions to evaluate. |
|
metadata |
An optional dictionary containing metadata about this evaluation. This could include information such as the dataset used, the model used, seed, etc. |
|
symbol_library |
The symbol library to use. |
|
total_expressions |
The total number of expressions considered. |
|
parameter_estimator |
An instance of the ParameterEstimator class used for parameter estimation. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
ndarray
|
The input data to be used in parameter estimation for variables. We assume that X is a 2D array with shape (n_samples, n_features). |
required |
y
|
ndarray
|
The target values to be used in parameter estimation. |
required |
max_evaluations
|
int
|
The maximum number of expressions to evaluate. Default is -1, which means no limit. |
-1
|
metadata
|
Optional[dict]
|
An optional dictionary containing metadata about this evaluation. This could include information such as the dataset used, the model used, seed, etc. |
None
|
symbol_library
|
SymbolLibrary
|
The symbol library to use. |
default_symbols()
|
Other Parameters:
Name | Type | Description |
---|---|---|
method |
str
|
The method to be used for minimization. Currently, only "L-BFGS-B" is supported/tested. Default is "L-BFGS-B". |
tol |
float
|
The tolerance for termination. Default is 1e-6. |
gtol |
float
|
The tolerance for the gradient norm. Default is 1e-3. |
max_iter |
int
|
The maximum number of iterations. Default is 100. |
bounds |
List[float]
|
A list of two elements, specifying the lower and upper bounds for the constant values. Default is [-5, 5]. |
initialization |
str
|
The method to use for initializing the constant values. Currently, only "random" and "mean" are supported. "random" creates a vector with random values sampled within the bounds. "mean" creates a vector where all values are calculated as (lower_bound + upper_bound)/2. Default is "random". |
max_constants |
int
|
The maximum number of constants allowed in the expression. Default is 8. |
max_expr_length |
int
|
The maximum length of the expression. Default is -1 (no limit). |
Functions:
Name | Description |
---|---|
evaluate_expr |
Evaluates an expression in infix notation and stores the result in memory to prevent re-evaluation. |
get_results |
Returns the results of the evaluation. |
Source code in SRToolkit/evaluation/sr_evaluator.py
evaluate_expr(expr)
Evaluates an expression in infix notation and stores the result in memory to prevent re-evaluation.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> print(rmse < 1e-6)
True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr
|
List[str]
|
A list of strings representing the expression in infix notation. |
required |
Returns:
Type | Description |
---|---|
float
|
The root mean square error of the expression. |
Warns:
Type | Description |
---|---|
Maximum number of evaluations reached
|
If the maximum number of evaluations has been reached, a warning is printed and np.nan is returned. |
Notes
If the expression has already been evaluated, its stored value is returned instead of re-evaluating the expression. When the maximum number of evaluations has been reached, a warning is printed and np.nan is returned.
Source code in SRToolkit/evaluation/sr_evaluator.py
get_results(top_k=20, success_threshold=1e-07)
Returns the results of the equation discovery/symbolic regression process/evaluation.
Examples:
>>> X = np.array([[1, 2], [8, 4], [5, 4], [7, 9], ])
>>> y = np.array([3, 0, 3, 11])
>>> se = SR_evaluator(X, y)
>>> rmse = se.evaluate_expr(["C", "*", "X_1", "-", "X_0"])
>>> results = se.get_results(top_k=1)
>>> print(results["num_evaluated"])
1
>>> print(results["total_expressions"])
1
>>> print(results["best_expr"])
C*X_1-X_0
>>> print(results["min_rmse"] < 1e-6)
True
>>> print(1.99 < results["results"][0]["parameters"][0] < 2.01)
True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
top_k
|
int
|
The number of top results to include in the output. If |
20
|
success_threshold
|
float
|
The threshold below which the evaluation is considered successful. Default is 1e-7. |
1e-07
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing the results of the equation discovery/symbolic regression process. The keys are:
|