Sharing Custom Implementations
SRToolkit's serialization model is built around a single contract: every custom class is identified by its fully-qualified Python module path (e.g. meznar_pcfg_grammar.MyGrammarConstraint). When you call to_dict, this path is embedded in the JSON. When another machine calls from_dict, it uses importlib to import that module and reconstruct the object. Sharing works as long as the recipient can import the class at the same path.
This guide covers the best practices for structuring and naming shareable files, then walks through everything that can be shared.
Best practices
Name files to avoid collisions
Use the convention {author}_{descriptor}.py — for example:
The descriptor should say what the file contains, not just "custom". This ensures that when multiple contributors' files land in the same working directory they do not overwrite each other.
Use the working directory as the sharing unit
Python adds . to sys.path automatically, so any .py file in the working directory is importable by name. The simplest sharing model is:
experiment/
├── meznar_pcfg_grammar.py # custom constraint
├── smith_gp_approach.py # custom approach
├── grammar.json # Grammar.to_dict() output
├── dataset.json # SR_dataset.to_dict() output
├── approach_config.json # ApproachConfig.to_dict() output
└── requirements.txt # SRToolkit version pin
Zip the directory and share it. The recipient unpacks and runs from inside it.
Never define shared classes in a script's __main__ scope
When a script is run directly, Python sets __module__ = "__main__" on every class defined in it. The path "__main__.MyApproach" is meaningless on any other machine. Always define classes in a named module file and import from there:
# good: class lives in smith_gp_approach.py, importable by that name
from smith_gp_approach import GeneticProgramming
# bad: class defined in the script being run — serializes as __main__.GeneticProgramming
class GeneticProgramming(SR_approach): ...
Avoid defining classes in notebooks
Classes defined in Jupyter cells also get __module__ = "__main__". Define the class in a .py file and import it into the notebook instead:
Verify importability before serializing
After writing your module file, confirm the class is importable under the exact path that will be embedded in the JSON:
If this fails, from_dict will fail on any other machine too.
Pin the SRToolkit version
The serialization format and internal class paths are tied to library internals. Include the version in a requirements.txt:
What can be shared
Symbol library
SymbolLibrary is fully self-contained and has no importlib dependency — all information is stored as plain data.
import json
from SRToolkit.utils import SymbolLibrary
sl = SymbolLibrary.from_symbol_list(["+", "-", "*", "sin", "^2"], num_variables=2)
with open("symbol_library.json", "w") as f:
json.dump(sl.to_dict(), f)
with open("symbol_library.json") as f:
sl2 = SymbolLibrary.from_dict(json.load(f))
Grammar
Grammar serializes its rules and constraints. Built-in constraints are fully supported. Custom constraints require a .py file on the path (see Custom constraints below).
import json
from SRToolkit.utils.grammar import Grammar
with open("grammar.json", "w") as f:
json.dump(g.to_dict(), f)
with open("grammar.json") as f:
g2 = Grammar.from_dict(json.load(f))
Dataset
SR_dataset serializes its symbol library, sampling configuration, metadata, and a reference to the data files. The data itself is saved separately as .npz files, so to_dict takes a base_path that controls where those files are written.
import json
from pathlib import Path
base = Path("my_dataset")
base.mkdir(exist_ok=True)
with open(base / "dataset.json", "w") as f:
json.dump(dataset.to_dict(base_path=str(base)), f)
Distribute the entire my_dataset/ directory. The recipient loads it with:
from SRToolkit.dataset import SR_dataset
with open("my_dataset/dataset.json") as f:
ds = SR_dataset.from_dict(json.load(f))
Custom samplers embedded in the dataset require a .py file on the path (see Custom samplers below).
Approach configuration
ApproachConfig stores all constructor parameters plus the fully-qualified class path of the approach. The recipient needs your approach .py file on their path.
Approach configs are normally consumed by ExperimentGrid rather than loaded directly — see Sharing a complete experiment below.
Callbacks
SRCallbacks subclasses serialize their constructor parameters alongside callback_class. Built-in callbacks (ProgressBarCallback, EarlyStoppingCallback, LoggingCallback) require no extra files.
import json
from SRToolkit.evaluation.callbacks import EarlyStoppingCallback
cb = EarlyStoppingCallback(threshold=1e-6)
with open("callback.json", "w") as f:
json.dump(cb.to_dict(), f)
Callbacks are normally passed to ExperimentGrid and reconstructed automatically — see Sharing a complete experiment below.
Custom constraints
Provide the constraint class in a module file following the naming convention, then share the grammar JSON alongside it:
# meznar_constraints.py
from SRToolkit.utils.grammar import Constraint
class PhysicsConstraint(Constraint):
def __init__(self, forbidden_terminals):
self.forbidden = frozenset(forbidden_terminals)
def allows(self, slot, rule, global_):
return self.forbidden.isdisjoint(rule.rhs)
def to_dict(self):
return {**super().to_dict(), "forbidden_terminals": sorted(self.forbidden)}
@classmethod
def from_dict(cls, d):
return cls(d["forbidden_terminals"])
The recipient places meznar_constraints.py in their working directory; Grammar.from_dict resolves the class automatically.
Custom samplers
Same pattern as constraints — provide the sampler class in a module file:
# meznar_samplers.py
from SRToolkit.dataset.sampling import Sampler
import numpy as np
class GaussianSampler(Sampler):
def __init__(self, mean, std):
self.mean = mean
self.std = std
def sample(self, n, rng=None):
rng = np.random.default_rng(rng)
return rng.normal(self.mean, self.std, n)
def to_dict(self):
return {
"sampler_class": f"{self.__class__.__module__}.{self.__class__.__qualname__}",
"mean": self.mean,
"std": self.std,
}
@classmethod
def from_dict(cls, d):
return cls(d["mean"], d["std"])
sampling_from_dict from SRToolkit.dataset.sampling dispatches via the sampler_class key, so the recipient only needs the module file on their path.
Custom callbacks
Same pattern — module file alongside the JSON:
# meznar_callbacks.py
from SRToolkit.evaluation.callbacks import SRCallbacks
class BestExprCallback(SRCallbacks):
def __init__(self, output_file):
self.output_file = output_file
def on_evaluation(self, result, evaluator):
with open(self.output_file, "w") as f:
f.write(str(result.best_expr))
def to_dict(self):
return {**super().to_dict(), "output_file": self.output_file}
@classmethod
def from_dict(cls, d):
return cls(d["output_file"])
Sharing a complete experiment
The natural unit for sharing a full experiment is an ExperimentGrid. Calling save() (or save_commands(), which calls it automatically) writes a self-contained directory:
results/
├── grid.json # grid specification
├── _datasets/
│ └── velocity/
│ ├── velocity.json # SR_dataset.to_dict() output
│ └── velocity.npz # data
├── _approaches/
│ └── meznar_gp_config.json # ApproachConfig.to_dict() output
└── _callbacks.json # list of callback dicts (if any)
Share the entire results/ directory alongside any custom .py files it depends on. The recipient loads with:
Datasets and approaches are reconstructed lazily when jobs run, so the load is fast. To execute a single job directly:
For HPC or parallel runs, generate a commands file and dispatch from there:
python -m SRToolkit.experiments commands \
--grid results/grid.json \
--out results/commands.txt
# then dispatch each line, e.g.:
bash results/commands.txt
Individual jobs can also be run via the CLI without loading the full grid:
python -m SRToolkit.experiments run_job \
--dataset results/_datasets/velocity/velocity.json \
--approach results/_approaches/meznar_gp_config.json \
--info results/velocity/meznar_gp/exp_0/info.json \
--callbacks results/_callbacks.json
Note
Only include custom .py files that are actually needed. A standard experiment using only built-in approaches, constraints, and samplers needs no .py files alongside the results/ directory — the JSON files are fully self-contained.