SR Dataset
            SRToolkit.dataset.sr_dataset
    
            SR_dataset
SR_dataset(X: ndarray, symbol_library: SymbolLibrary, ranking_function: str = 'rmse', y: Optional[ndarray] = None, max_evaluations: int = -1, ground_truth: Optional[Union[List[str], Node, ndarray]] = None, original_equation: Optional[str] = None, success_threshold: Optional[float] = None, result_augmenters: Optional[List[ResultAugmenter]] = None, seed: Optional[int] = None, dataset_metadata: Optional[dict] = None, **kwargs)
Initializes an instance of the SR_dataset class.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                X
             | 
            
                  ndarray
             | 
            
               The input data to be used in calculation of the error/ranking function. We assume that X is a 2D array with the shape (n_samples, n_features).  | 
            required | 
                symbol_library
             | 
            
                  SymbolLibrary
             | 
            
               The symbol library to use.  | 
            required | 
                ranking_function
             | 
            
                  str
             | 
            
               The ranking function to use. Currently, "rmse" and "bed" are supported. RMSE is the standard ranking function in symbolic regression, calculating the error between the ground truth values and outputs of expressions with fitted free parameters. BED is a stochastic measure that calculates the behavioral distance between two expressions that can contain free parameters. Its advantage is that expressions with lots of parameters are less likely to overfit, and thus the measure focuses more on structure identification.  | 
            
                  'rmse'
             | 
          
                y
             | 
            
                  Optional[ndarray]
             | 
            
               The target values to be used in parameter estimation if the ranking function is "rmse".  | 
            
                  None
             | 
          
                max_evaluations
             | 
            
                  int
             | 
            
               The maximum number of expressions to evaluate. Less than 0 means no limit.  | 
            
                  -1
             | 
          
                ground_truth
             | 
            
                  Optional[Union[List[str], Node, ndarray]]
             | 
            
               The ground truth expression, represented as a list of tokens (strings) in the infix notation, a SRToolkit.utils.Node object, or a numpy array representing behavior (see SRToolkit.utils.create_behavior_matrix for more details).  | 
            
                  None
             | 
          
                original_equation
             | 
            
                  Optional[str]
             | 
            
               The original equation from which the ground truth expression was generated).  | 
            
                  None
             | 
          
                result_augmenters
             | 
            
                  Optional[List[ResultAugmenter]]
             | 
            
               Optional list of objects that augment the results returned by the "get_results" function.  | 
            
                  None
             | 
          
                seed
             | 
            
                  Optional[int]
             | 
            
               The seed to use for random number generation/reproducibility. Default is None, which means no seed is used.  | 
            
                  None
             | 
          
                dataset_metadata
             | 
            
                  Optional[dict]
             | 
            
               An optional dictionary containing metadata about this evaluation. This could include information such as the name of the dataset, a citation for the dataset, number of variables, etc.  | 
            
                  None
             | 
          
Other Parameters:
| Name | Type | Description | 
|---|---|---|
method | 
            
                  str
             | 
            
               The method to be used for minimization. Currently, only "L-BFGS-B" is supported/tested. Default is "L-BFGS-B".  | 
          
tol | 
            
                  float
             | 
            
               The tolerance for termination. Default is 1e-6.  | 
          
gtol | 
            
                  float
             | 
            
               The tolerance for the gradient norm. Default is 1e-3.  | 
          
max_iter | 
            
                  int
             | 
            
               The maximum number of iterations. Default is 100.  | 
          
constant_bounds | 
            
                  Tuple[float, float]
             | 
            
               A tuple of two elements, specifying the lower and upper bounds for the constant values. Default is (-5, 5).  | 
          
initialization | 
            
                  str
             | 
            
               The method to use for initializing the constant values. Currently, only "random" and "mean" are supported. "random" creates a vector with random values sampled within the bounds. "mean" creates a vector where all values are calculated as (lower_bound + upper_bound)/2. Default is "random".  | 
          
max_constants | 
            
                  int
             | 
            
               The maximum number of constants allowed in the expression. Default is 8.  | 
          
max_expr_length | 
            
                  int
             | 
            
               The maximum length of the expression. Default is -1 (no limit).  | 
          
num_points_sampled | 
            
                  int
             | 
            
               The number of points to sample when estimating the behavior of an expression. Default is 64. If num_points_sampled==-1, then the number of points sampled is equal to the number of points in the dataset.  | 
          
bed_X | 
            
                  Optional[ndarray]
             | 
            
               Points used for BED evaluation. If None and domain_bounds are given, points are sampled from the domain. If None and domain_bounds are not givem, points are randomly selected from X. Default is None.  | 
          
num_consts_sampled | 
            
                  int
             | 
            
               Number of constants sampled for BED evaluation. Default is 32.  | 
          
domain_bounds | 
            
                  Optional[List[Tuple[float, float]]]
             | 
            
               Bounds for the domain to be used if bed_X is None to sample random points. Default is None.  | 
          
Source code in SRToolkit/dataset/sr_dataset.py
                    
            create_evaluator
    Creates an instance of the SR_evaluator class from this dataset.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                metadata
             | 
            
                  dict
             | 
            
               An optional dictionary containing metadata about this evaluation. This could include information such as the dataset used, the model used, seed, etc.  | 
            
                  None
             | 
          
Returns:
| Type | Description | 
|---|---|
                  SR_evaluator
             | 
            
               An instance of the SR_evaluator class.  | 
          
Raises:
| Type | Description | 
|---|---|
                  Exception
             | 
            
               if an error occurs when creating the evaluator.  | 
          
Source code in SRToolkit/dataset/sr_dataset.py
              
            __str__
    Returns a string describing this dataset.
The string describes the target expression, symbols that should be used, and the success threshold. It also includes any constraints that should be followed when evaluating a model on this dataset. These constraints include the maximum number of expressions to evaluate, the maximum length of the expression, and the maximum number of constants allowed in the expression. If the symbol library contains a symbol for constants, the string also includes the range of constants.
For other metadata, please refer to the attribute self.dataset_metadata.
Returns:
| Type | Description | 
|---|---|
                  str
             | 
            
               A string describing this dataset.  |