Dataset Module
SRToolkit.dataset
This module contains data sets
SRDataset
Source code in SRToolkit/dataset/srdataset.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
__init__(X, y, ground_truth, original_equation, symbols, max_evaluations=-1, max_expression_length=-1, max_constants=8, success_threshold=1e-07, constant_range=None, dataset_metadata=None)
Initializes an instance of the SRDataset class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
ndarray
|
The input data to be used in parameter estimation for variables. We assume that X is a 2D array with shape (n_samples, n_features). |
required |
y
|
ndarray
|
The target values to be used in parameter estimation. |
required |
ground_truth
|
List[str]
|
The ground truth expression, represented as a list of tokens (strings) in the infix notation. |
required |
original_equation
|
str
|
The original equation from which the ground truth expression was generated). |
required |
symbols
|
SymbolLibrary
|
The symbol library to use. |
required |
max_evaluations
|
int
|
The maximum number of expressions to evaluate. Less than 0 means no limit. |
-1
|
max_expression_length
|
int
|
The maximum length of the expression. Less than 0 means no limit. |
-1
|
max_constants
|
int
|
The maximum number of constants allowed in the expression. Less than 0 means no limit. |
8
|
success_threshold
|
float
|
The RMSE threshold below which the experiment is considered successful. |
1e-07
|
constant_range
|
List[float]
|
A list of two floats, specifying the lower and upper bounds for the constant values. Default is [-5.0, 5.0]. If constant_range is None, we automatically set it to [-5.0, 5.0] if the symbol library contains a symbol for constants. |
None
|
dataset_metadata
|
dict
|
An optional dictionary containing metadata about this evaluation. This could include information such as the name of the dataset, a citation for the dataset, number of variables, etc. |
None
|
Source code in SRToolkit/dataset/srdataset.py
__str__()
Returns a string describing this dataset.
The string describes the target expression, symbols that should be used, and the success threshold. It also includes any constraints that should be followed when evaluating a model on this dataset, such as the maximum number of expressions to evaluate, the maximum length of the expression, and the maximum number of constants allowed in the expression. If the symbol library contains a symbol for constants, the string also includes the range of constants.
For other metadata, please refer to the attribute self.dataset_metadata.
Returns:
Type | Description |
---|---|
str
|
A string describing this dataset. |
Source code in SRToolkit/dataset/srdataset.py
create_evaluator(metadata=None)
Creates an instance of the SR_evaluator class from this dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metadata
|
dict
|
An optional dictionary containing metadata about this evaluation. This could include information such as the dataset used, the model used, seed, etc. |
None
|
Returns:
Type | Description |
---|---|
SR_evaluator
|
An instance of the SR_evaluator class. |