Skip to content

Nguyen

SRToolkit.dataset.nguyen

Nguyen symbolic regression benchmark.

Nguyen

Nguyen(dataset_directory: str = os.path.join(user_data_dir('SRToolkit'), 'nguyen'), n_samples: int = 10000, seed: Optional[int] = 42, force_generate: bool = False)

Bases: SR_benchmark

The Nguyen symbolic regression benchmark.

Contains 10 expressions without constant parameters (first 4 are polynomials, first 8 use one variable, last 2 use two variables). The benchmark ships with pre-generated data. If the download fails, data is generated from the stored per-variable samplers using n_samples points and the given seed.

References

Uy et al. (2011)

Examples:

>>> benchmark = Nguyen()
>>> len(benchmark.list_datasets(verbose=False))
10

Parameters:

Name Type Description Default
dataset_directory str

Directory where dataset files are stored or will be downloaded to. Defaults to the platform-appropriate user data directory (e.g. ~/.local/share/SRToolkit/nguyen on Linux).

join(user_data_dir('SRToolkit'), 'nguyen')
n_samples int

Number of samples to generate per dataset when falling back to sampler-based data generation (i.e. when the download fails or force_generate=True). Defaults to 1000.

10000
seed Optional[int]

Random seed used for sampler-based data generation. Defaults to 42.

42
force_generate bool

If True, skip downloading/loading pre-generated data and always generate fresh data from samplers. Defaults to False.

False
Source code in SRToolkit/dataset/nguyen.py
def __init__(
    self,
    dataset_directory: str = os.path.join(user_data_dir("SRToolkit"), "nguyen"),
    n_samples: int = 10000,
    seed: Optional[int] = 42,
    force_generate: bool = False,
):
    super().__init__("Nguyen", dataset_directory)
    self._n_samples = n_samples
    self._seed = seed
    self._force_generate = force_generate
    self._populate()