Skip to content

Feynman

SRToolkit.dataset.feynman

Feynman symbolic regression benchmark.

Feynman

Feynman(dataset_directory: str = os.path.join(user_data_dir('SRToolkit'), 'feynman'))

Bases: SR_benchmark

The Feynman symbolic regression benchmark.

Contains 100 physics equations with up to 9 variables. Data is downloaded on first use from the SymbolicRegressionToolkit repository (10,000 samples per dataset instead of the original 1,000,000 from the paper).

For more information about the Feynman benchmark, see: https://doi.org/10.1126/sciadv.aay2631

Examples:

>>> benchmark = Feynman()
>>> len(benchmark.list_datasets(verbose=False))
100
>>> X, y = benchmark.resample('I.16.6', n=500, seed=0)
>>> X.shape
(500, 3)
>>> y.shape
(500,)

Parameters:

Name Type Description Default
dataset_directory str

Directory where dataset files are stored or will be downloaded to. Defaults to the platform-appropriate user data directory (e.g. ~/.local/share/SRToolkit/feynman on Linux).

join(user_data_dir('SRToolkit'), 'feynman')
Source code in SRToolkit/dataset/feynman.py
def __init__(self, dataset_directory: str = os.path.join(user_data_dir("SRToolkit"), "feynman")):
    super().__init__("feynman", dataset_directory)
    self._populate()

resample

resample(dataset_name: str, n: int, seed: Optional[int] = None) -> Tuple[np.ndarray, np.ndarray]

Generate fresh data for a dataset by sampling new inputs and evaluating the ground truth.

Variable bounds are taken from _BOUNDS.

Examples:

>>> benchmark = Feynman()
>>> X, y = benchmark.resample('I.16.6', n=200, seed=42)
>>> X.shape
(200, 3)

Parameters:

Name Type Description Default
dataset_name str

Name of the dataset to resample.

required
n int

Number of new samples to generate.

required
seed Optional[int]

Random seed for reproducibility.

None

Returns:

Type Description
Tuple[ndarray, ndarray]

A tuple (X, y) of numpy arrays with shapes (n, n_vars) and (n,).

Raises:

Type Description
ValueError

If the dataset has no ground truth expression.

Source code in SRToolkit/dataset/feynman.py
def resample(self, dataset_name: str, n: int, seed: Optional[int] = None) -> Tuple[np.ndarray, np.ndarray]:
    """
    Generate fresh data for a dataset by sampling new inputs and evaluating the ground truth.

    Variable bounds are taken from ``_BOUNDS``.

    Examples:
        >>> benchmark = Feynman()
        >>> X, y = benchmark.resample('I.16.6', n=200, seed=42)
        >>> X.shape
        (200, 3)

    Args:
        dataset_name: Name of the dataset to resample.
        n: Number of new samples to generate.
        seed: Random seed for reproducibility.

    Returns:
        A tuple ``(X, y)`` of numpy arrays with shapes ``(n, n_vars)`` and ``(n,)``.

    Raises:
        ValueError: If the dataset has no ground truth expression.
    """
    info = self.datasets[dataset_name]
    if info.get("ground_truth") is None:
        raise ValueError(f"Dataset '{dataset_name}' has no ground truth expression — cannot compute y.")
    bounds = _BOUNDS[dataset_name]
    lb = np.array([b[0] for b in bounds], dtype=float)
    ub = np.array([b[1] for b in bounds], dtype=float)
    rng = np.random.default_rng(seed)
    X_new = rng.uniform(lb, ub, size=(n, len(bounds)))
    sl = SymbolLibrary.from_dict(info["symbol_library"])
    f = expr_to_executable_function(info["ground_truth"], sl)
    y_new = f(X_new, np.array([]))
    return X_new, y_new