Utils Submodule
SRToolkit.utils
Utilities for expression representation, compilation, generation, and evaluation.
Modules:
| Name | Description |
|---|---|
symbol_library |
The SymbolLibrary class — manages the token vocabulary and token properties. |
expression_tree |
The Node binary-tree representation and conversion utilities for expressions. |
expression_compiler |
Compiles token-list or tree expressions into executable Python callables. |
expression_simplifier |
SymPy-backed algebraic simplification, including constant folding. |
expression_generator |
PCFG construction from a SymbolLibrary and Monte-Carlo expression sampling. |
measures |
Distance and similarity measures: edit distance, tree edit distance, and Behavior-aware Expression Distance (BED). |
serialization |
Internal JSON serialization utilities for numpy types. |
Node
A node in a binary expression tree.
- Binary operators (
"op") set bothleftandright. - Unary functions (
"fn") set onlyleft;rightisNone. - Leaves (variables, constants, literals, numeric values) have both children as
None.
Examples:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
Token string stored at this node. |
required |
right
|
Optional[Node]
|
Right operand (binary operators only). |
None
|
left
|
Optional[Node]
|
Left operand (operators and unary functions). |
None
|
Source code in SRToolkit/utils/expression_tree.py
to_list
Transforms the tree rooted at this node into a list of tokens.
Examples:
>>> node = Node("+", Node("X_0"), Node("1"))
>>> node.to_list(symbol_library=SymbolLibrary.default_symbols())
['1', '+', 'X_0']
>>> node.to_list(notation="postfix")
['1', 'X_0', '+']
>>> node.to_list(notation="prefix")
['+', '1', 'X_0']
>>> node = Node("+", Node("*", Node("X_0"), Node("X_1")), Node("1"))
>>> node.to_list(symbol_library=SymbolLibrary.default_symbols())
['1', '+', 'X_1', '*', 'X_0']
>>> node.to_list(notation="infix")
['1', '+', '(', 'X_1', '*', 'X_0', ')']
>>> node = Node("sin", None, Node("X_0"))
>>> node.to_list(symbol_library=SymbolLibrary.default_symbols())
['sin', '(', 'X_0', ')']
>>> node = Node("^2", None, Node("X_0"))
>>> node.to_list(symbol_library=SymbolLibrary.default_symbols())
['X_0', '^2']
>>> node.to_list()
['(', 'X_0', ')', '^2']
>>> node = Node("*", Node("*", Node("X_0"), Node("X_0")), Node("X_0"))
>>> node.to_list(symbol_library=SymbolLibrary.default_symbols(),notation="infix")
['X_0', '*', '(', 'X_0', '*', 'X_0', ')']
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol_library
|
Optional[SymbolLibrary]
|
Symbol library used to determine token types and precedences
during infix reconstruction. If |
None
|
notation
|
str
|
Output notation: |
'infix'
|
Returns:
| Type | Description |
|---|---|
List[str]
|
Token list representing the subtree rooted at this node. |
Raises:
| Type | Description |
|---|---|
Exception
|
If |
Source code in SRToolkit/utils/expression_tree.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |
to_latex
Transforms the tree rooted at this node into a LaTeX expression.
Examples:
>>> node = Node("+", Node("X_0"), Node("1"))
>>> node.to_latex(symbol_library=SymbolLibrary.default_symbols())
'$1 + X_{0}$'
>>> node = Node("+", Node("*", Node("X_0"), Node("X_1")), Node("1"))
>>> print(node.to_latex(symbol_library=SymbolLibrary.default_symbols()))
$1 + X_{1} \cdot X_{0}$
>>> node = Node("sin", None, Node("X_0"))
>>> print(node.to_latex(symbol_library=SymbolLibrary.default_symbols()))
$\sin X_{0}$
>>> node = Node("+", Node("*", Node("X_0"), Node("C")), Node("C"))
>>> print(node.to_latex(symbol_library=SymbolLibrary.default_symbols()))
$C_{0} + C_{1} \cdot X_{0}$
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol_library
|
SymbolLibrary
|
Symbol library providing the LaTeX template for each token. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A LaTeX string of the form |
Raises:
| Type | Description |
|---|---|
Exception
|
If the tree contains a token whose type cannot be resolved in
|
Source code in SRToolkit/utils/expression_tree.py
height
Return the height of the subtree rooted at this node.
A single-node tree has height 1.
Examples:
Returns:
| Type | Description |
|---|---|
int
|
Height of the subtree. |
Source code in SRToolkit/utils/expression_tree.py
__len__
Return the number of nodes in the subtree rooted at this node.
Examples:
Returns:
| Type | Description |
|---|---|
int
|
Total node count of the subtree. |
Source code in SRToolkit/utils/expression_tree.py
__str__
Return the expression as a concatenated string using default infix notation that may contain redundant parentheses.
Examples:
Returns:
| Type | Description |
|---|---|
str
|
Concatenated token string with no spaces. |
Source code in SRToolkit/utils/expression_tree.py
__copy__
Return a deep copy of the subtree rooted at this node.
Examples:
>>> node = Node("+", Node("X_0"), Node("1"))
>>> new_node = copy(node)
>>> node.to_list(symbol_library=SymbolLibrary.default_symbols())
['1', '+', 'X_0']
>>> new_node.to_list(symbol_library=SymbolLibrary.default_symbols())
['1', '+', 'X_0']
>>> node == node
True
>>> node == new_node
False
Returns:
| Type | Description |
|---|---|
Node
|
An independent copy of the subtree. |
Source code in SRToolkit/utils/expression_tree.py
SymbolLibrary
SymbolLibrary(
symbols: Optional[List[str]] = None,
num_variables: int = 0,
preamble: Optional[List[str]] = None,
)
A registry of tokens and their properties, used throughout the toolkit to parse, compile, and generate symbolic expressions.
By default, the library uses NumPy for operator and function evaluation. To use a
different backend, pass the required import statements via preamble.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x", "x")
>>> library.get_type("x")
'var'
>>> library.get_precedence("x")
0
>>> library.get_np_fn("x")
'x'
>>> library.remove_symbol("x")
>>> library = SymbolLibrary.default_symbols()
>>> # You can also initialize the library with a list of symbols (listed in SymbolLibrary.default_symbols)
>>> # and the number of variables.
>>> library2 = SymbolLibrary(["+", "*", "sin"], num_variables=2)
>>> len(library2)
5
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbols
|
Optional[List[str]]
|
Symbols to pre-populate from the default set. |
None
|
num_variables
|
int
|
Number of variable tokens to add, labeled |
0
|
preamble
|
Optional[List[str]]
|
Import statements prepended to compiled expression functions.
Defaults to |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
symbols |
Mapping from token string to its property dict (type, precedence, NumPy function string, LaTeX template). |
Source code in SRToolkit/utils/symbol_library.py
add_symbol
add_symbol(
symbol: str,
symbol_type: str,
precedence: int,
np_fn: str,
latex_str: Optional[str] = None,
)
Add a token to the library with its associated type, precedence, NumPy function string, and LaTeX template.
Symbol types:
"op": binary operator (e.g.+,*)."fn": unary function (e.g.sin,sqrt)."lit": literal with a fixed value (e.g.pi,e)."const": free constant whose value is optimised during parameter estimation (e.g.C). Using a single"const"token is recommended; multiple tokens increase complexity and reduce readability."var": input variable whose values are read from the data arrayX.
If latex_str is omitted, a default template is generated: "{} \text{symb} {}"
for operators, "\text{symb} {}" for functions, and "\text{symb}" otherwise.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x")
>>> library.add_symbol("sin", "fn", 5, "np.sin({})", r"\sin {}")
>>> library.add_symbol("C", "const", 5, "C[{}]", r"c_{}")
>>> library.add_symbol("X_0", "var", 5, "X[:, 0]", r"X_0")
>>> library.add_symbol("pi", "lit", 5, "np.pi", r"\pi")
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
Token string to register. |
required |
symbol_type
|
str
|
One of |
required |
precedence
|
int
|
Operator precedence, used for infix reconstruction and PCFG generation. |
required |
np_fn
|
str
|
Python/NumPy expression string used in compiled callables
(e.g. |
required |
latex_str
|
Optional[str]
|
LaTeX template string with |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in SRToolkit/utils/symbol_library.py
remove_symbol
Remove a token from the library.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x")
>>> len(library.symbols)
1
>>> library.remove_symbol("x")
>>> len(library.symbols)
0
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
Token string to remove. |
required |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
Source code in SRToolkit/utils/symbol_library.py
get_type
Return the type of a symbol.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x")
>>> library.get_type("x")
'var'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
Token to look up. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The type string ( |
Source code in SRToolkit/utils/symbol_library.py
get_precedence
Return the precedence of a symbol.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x")
>>> library.get_precedence("x")
0
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
Token to look up. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The precedence value if the symbol is in the library, otherwise |
Source code in SRToolkit/utils/symbol_library.py
get_np_fn
Return the NumPy function string for a symbol.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x")
>>> library.get_np_fn("x")
'x'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
Token to look up. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The NumPy function string if the symbol is in the library, otherwise an empty string. |
Source code in SRToolkit/utils/symbol_library.py
get_latex_str
Return the LaTeX template string for a symbol.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x", "test")
>>> library.get_latex_str("x")
'test'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
Token to look up. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The LaTeX template string if the symbol is in the library, otherwise an empty string. |
Source code in SRToolkit/utils/symbol_library.py
get_symbols_of_type
Return all symbols of a given type.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x")
>>> library.add_symbol("y", "var", 0, "y")
>>> library.get_symbols_of_type("var")
['x', 'y']
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol_type
|
str
|
Type to filter by. One of |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of token strings matching the requested type. |
Source code in SRToolkit/utils/symbol_library.py
symbols2index
Return a mapping from each token to its index in insertion order.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x")
>>> library.add_symbol("y", "var", 0, "y")
>>> print(library.symbols2index())
{'x': 0, 'y': 1}
>>> library.remove_symbol("x")
>>> print(library.symbols2index())
{'y': 0}
Returns:
| Type | Description |
|---|---|
Dict[str, int]
|
Dict mapping each token string to its zero-based position in the library. |
Source code in SRToolkit/utils/symbol_library.py
from_symbol_list
staticmethod
Create a SymbolLibrary containing only the specified subset of default symbols.
The supported token names are those defined in default_symbols.
Examples:
>>> library = SymbolLibrary().from_symbol_list(["+", "*", "C"], num_variables=2)
>>> len(library.symbols)
5
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbols
|
List[str]
|
Token strings to include. Must be a subset of the default symbol names. |
required |
num_variables
|
int
|
Number of variable tokens ( |
25
|
Returns:
| Type | Description |
|---|---|
SymbolLibrary
|
A SymbolLibrary restricted to the requested symbols and variables. |
Source code in SRToolkit/utils/symbol_library.py
default_symbols
staticmethod
Return a SymbolLibrary pre-populated with standard mathematical symbols.
Supported tokens:
- Operators (
"op"):+,-,*,/,^ - Functions (
"fn"):u-,sqrt,sin,cos,exp,tan,arcsin,arccos,arctan,sinh,cosh,tanh,floor,ceil,ln,log,^-1,^2,^3,^4,^5 - Literals (
"lit"):pi,e - Free constant (
"const"):C - Variables (
"var"):X_0throughX_{num_variables-1}, mapped to columns of the input array in order.
Examples:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_variables
|
int
|
Number of variable tokens to include. Default is |
25
|
Returns:
| Type | Description |
|---|---|
SymbolLibrary
|
A SymbolLibrary populated with the symbols listed above. |
Source code in SRToolkit/utils/symbol_library.py
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 | |
to_dict
Serialize the library to a JSON-safe dictionary.
Returns:
| Type | Description |
|---|---|
dict
|
A dictionary suitable for passing to from_dict. |
Source code in SRToolkit/utils/symbol_library.py
from_dict
staticmethod
Reconstruct a SymbolLibrary from a dictionary produced by to_dict.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
dict
|
Dictionary representation of the library. |
required |
Returns:
| Type | Description |
|---|---|
SymbolLibrary
|
The reconstructed SymbolLibrary. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in SRToolkit/utils/symbol_library.py
__len__
Return the number of symbols currently in the library.
Examples:
>>> library = SymbolLibrary.default_symbols(5)
>>> len(library)
34
>>> library.add_symbol("a", "lit", 5, "a", "a")
>>> len(library)
35
Returns:
| Type | Description |
|---|---|
int
|
Number of tokens registered in the library. |
Source code in SRToolkit/utils/symbol_library.py
__str__
Return a comma-separated string of all registered token strings.
Examples:
>>> library = SymbolLibrary()
>>> library.add_symbol("x", "var", 0, "x", "x")
>>> str(library)
'x'
>>> library.add_symbol("sin", "fn", 5, "{} = np.sin({})", r"\sin {}")
>>> str(library)
'x, sin'
Returns:
| Type | Description |
|---|---|
str
|
All token names joined by |
Source code in SRToolkit/utils/symbol_library.py
__copy__
Return a copy of the library with independent copies of all attributes.
Examples:
>>> old_symbols = SymbolLibrary()
>>> old_symbols.add_symbol("x", "var", 0, "x", "x")
>>> print(old_symbols)
x
>>> new_symbols = copy.copy(old_symbols)
>>> new_symbols.add_symbol("sin", "fn", 5, "{} = np.sin({})", r"\sin {}")
>>> print(old_symbols)
x
>>> print(new_symbols)
x, sin
Returns:
| Type | Description |
|---|---|
SymbolLibrary
|
A new SymbolLibrary instance with deep-copied symbols and preamble. |
Source code in SRToolkit/utils/symbol_library.py
EstimationSettings
Bases: TypedDict
Shared settings for parameter estimation and BED evaluation.
Passed as **kwargs to SR_dataset, SR_evaluator, and
ParameterEstimator. All fields are optional.
Examples:
>>> settings: EstimationSettings = {"method": "L-BFGS-B", "max_iter": 200}
>>> settings.get("method")
'L-BFGS-B'
>>> settings.get("tol", 1e-6)
1e-06
Attributes:
| Name | Type | Description |
|---|---|---|
method |
str
|
Optimization algorithm for parameter fitting. Default: |
tol |
float
|
Termination tolerance for the optimizer. Default: |
gtol |
float
|
Gradient-norm termination tolerance. Default: |
max_iter |
int
|
Maximum optimizer iterations. Default: |
constant_bounds |
Union[Tuple[float, float]]
|
|
initialization |
str
|
Constant initialization strategy — |
max_constants |
int
|
Maximum number of free constants permitted in a single
expression. Expressions exceeding this limit score |
max_expr_length |
int
|
Maximum expression length in tokens. |
num_points_sampled |
int
|
Number of domain points used when evaluating expression
behavior for BED. |
bed_X |
Optional[ndarray]
|
Fixed evaluation points for BED. If |
num_consts_sampled |
int
|
Number of constant vectors sampled per expression for
BED. Default: |
domain_bounds |
Optional[List[Tuple[float, float]]]
|
Per-variable |
EvalResult
dataclass
EvalResult(
min_error: float,
best_expr: str,
num_evaluated: int,
evaluation_calls: int,
top_models: List[ModelResult],
all_models: List[ModelResult],
approach_name: str,
success: bool,
dataset_name: Optional[str] = None,
metadata: Optional[dict] = None,
augmentations: Dict[str, Dict[str, Any]] = dict(),
)
Result for a single SR experiment, as returned by SR_results[i].
Examples:
>>> model = ModelResult(expr=["X_0"], error=0.05)
>>> result = EvalResult(
... min_error=0.05,
... best_expr="X_0",
... num_evaluated=500,
... evaluation_calls=612,
... top_models=[model],
... all_models=[model],
... approach_name="MyApproach",
... success=True,
... )
>>> result.min_error
0.05
>>> result.success
True
>>> result.dataset_name is None
True
Attributes:
| Name | Type | Description |
|---|---|---|
min_error |
float
|
Lowest error achieved across all evaluated expressions. |
best_expr |
str
|
String representation of the best expression found. |
num_evaluated |
int
|
Number of unique expressions evaluated. |
evaluation_calls |
int
|
Number of times |
top_models |
List[ModelResult]
|
Top-k models sorted by error. |
all_models |
List[ModelResult]
|
All evaluated models sorted by error. |
approach_name |
str
|
Name of the SR approach, or empty string if not provided. |
success |
bool
|
Whether |
dataset_name |
Optional[str]
|
Name of the dataset, extracted from metadata. |
metadata |
Optional[dict]
|
Remaining metadata dict after |
augmentations |
Dict[str, Dict[str, Any]]
|
Per-augmenter data keyed by augmenter name. Populated by ResultAugmenter subclasses via add_augmentation. |
add_augmentation
Attach augmentation data produced by a :class:ResultAugmenter to this result.
If name is already present in :attr:augmentations, a numeric suffix is
appended (name_1, name_2, …) to avoid overwriting existing data.
Examples:
>>> model = ModelResult(expr=["X_0"], error=0.05)
>>> result = EvalResult(
... min_error=0.05, best_expr="X_0", num_evaluated=10,
... evaluation_calls=10, top_models=[model], all_models=[model],
... approach_name="MyApproach", success=True,
... )
>>> result.add_augmentation("complexity", {"value": 3}, "ComplexityAugmenter")
>>> result.augmentations["complexity"]["value"]
3
>>> result.add_augmentation("complexity", {"value": 5}, "ComplexityAugmenter")
>>> "complexity_1" in result.augmentations
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Key under which the augmentation is stored in :attr: |
required |
data
|
Dict[str, Any]
|
Arbitrary dict of augmentation data. A |
required |
aug_type
|
str
|
Augmenter class name, stored as |
required |
Source code in SRToolkit/utils/types.py
to_dict
Serialize this evaluation result to a JSON-safe dictionary.
NumPy arrays and scalars within nested :class:ModelResult entries are
converted to native Python types so the result can be passed directly
to json.dump.
Examples:
>>> model = ModelResult(expr=["X_0"], error=0.05)
>>> result = EvalResult(
... min_error=0.05, best_expr="X_0", num_evaluated=10,
... evaluation_calls=10, top_models=[model], all_models=[model],
... approach_name="MyApproach", success=True,
... )
>>> d = result.to_dict()
>>> d["min_error"]
0.05
>>> d["approach_name"]
'MyApproach'
>>> len(d["top_models"])
1
Returns:
| Type | Description |
|---|---|
dict
|
A JSON-safe dictionary suitable for passing to :meth: |
Source code in SRToolkit/utils/types.py
from_dict
staticmethod
Reconstruct an :class:EvalResult from a dictionary produced by :meth:to_dict.
Examples:
>>> model = ModelResult(expr=["X_0"], error=0.05)
>>> result = EvalResult(
... min_error=0.05, best_expr="X_0", num_evaluated=10,
... evaluation_calls=10, top_models=[model], all_models=[model],
... approach_name="MyApproach", success=True,
... )
>>> result2 = EvalResult.from_dict(result.to_dict())
>>> result2.min_error
0.05
>>> result2.best_expr
'X_0'
>>> len(result2.top_models)
1
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
Dictionary representation of an :class: |
required |
Returns:
| Type | Description |
|---|---|
EvalResult
|
The reconstructed :class: |
Source code in SRToolkit/utils/types.py
ModelResult
dataclass
ModelResult(
expr: List[str],
error: float,
parameters: Optional[ndarray] = None,
augmentations: Dict[str, Dict[str, Any]] = dict(),
)
A single model entry in EvalResult.top_models and EvalResult.all_models.
Examples:
>>> result = ModelResult(expr=["C", "*", "X_0"], error=0.42)
>>> result.expr
['C', '*', 'X_0']
>>> result.error
0.42
>>> result.parameters is None
True
Attributes:
| Name | Type | Description |
|---|---|---|
expr |
List[str]
|
Token list representing the expression, e.g. |
error |
float
|
Numeric error under the ranking function (RMSE or BED). |
parameters |
Optional[ndarray]
|
Fitted constant values. Present for RMSE ranking only, |
augmentations |
Dict[str, Dict[str, Any]]
|
Per-augmenter data keyed by augmenter name. Populated by ResultAugmenter subclasses via add_augmentation. |
add_augmentation
Attach augmentation data produced by a :class:ResultAugmenter to this result.
If name is already present in :attr:augmentations, a numeric suffix is
appended (name_1, name_2, …) to avoid overwriting existing data.
Examples:
>>> result = ModelResult(expr=["X_0"], error=0.1)
>>> result.add_augmentation("latex", {"value": "$X_0$"}, "LaTeXAugmenter")
>>> result.augmentations["latex"]["value"]
'$X_0$'
>>> result.add_augmentation("latex", {"value": "$X_0$"}, "LaTeXAugmenter")
>>> "latex_1" in result.augmentations
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Key under which the augmentation is stored in :attr: |
required |
data
|
Dict[str, Any]
|
Arbitrary dict of augmentation data. A |
required |
aug_type
|
str
|
Augmenter class name, stored as |
required |
Source code in SRToolkit/utils/types.py
to_dict
Serialize this model result to a JSON-safe dictionary.
NumPy arrays and scalars are converted to native Python types so the
result can be passed directly to json.dump.
Examples:
>>> result = ModelResult(expr=["X_0", "+", "C"], error=0.25)
>>> d = result.to_dict()
>>> d["expr"]
['X_0', '+', 'C']
>>> d["error"]
0.25
>>> d["parameters"] is None
True
Returns:
| Type | Description |
|---|---|
dict
|
A JSON-safe dictionary suitable for passing to :meth: |
Source code in SRToolkit/utils/types.py
from_dict
staticmethod
Reconstruct a :class:ModelResult from a dictionary produced by :meth:to_dict.
Examples:
>>> result = ModelResult(expr=["X_0", "+", "C"], error=0.25)
>>> result2 = ModelResult.from_dict(result.to_dict())
>>> result2.expr
['X_0', '+', 'C']
>>> result2.error
0.25
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
Dictionary representation of a :class: |
required |
Returns:
| Type | Description |
|---|---|
ModelResult
|
The reconstructed :class: |
Source code in SRToolkit/utils/types.py
expr_to_error_function
expr_to_error_function(
expr: Union[List[str], Node],
symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
) -> Callable[[np.ndarray, np.ndarray, np.ndarray], float]
Compile an expression into a callable that computes the RMSE against target values.
To use a backend other than NumPy, set symbol_library.preamble to the required
import statements.
Examples:
>>> executable_fun = expr_to_error_function(["X_0", "+", "1"])
>>> print(float(executable_fun(np.array([[1], [2], [3], [4]]), np.array([]), np.array([2, 3, 4, 5]))))
0.0
>>> tree = tokens_to_tree(["X_0", "+", "1"], SymbolLibrary.default_symbols(1))
>>> executable_fun = expr_to_error_function(tree)
>>> print(float(executable_fun(np.array([[1], [2], [3], [4]]), np.array([]), np.array([2, 3, 4, 5]))))
0.0
>>> # In case you need libraries other than numpy for the evaluation of your expressions,
>>> # you can add them to the preamble in the SymbolLibrary. Here is how a preamble would look like:
>>> symbol_library = SymbolLibrary.default_symbols(1)
>>> symbol_library.preamble = ["import numpy as np"]
>>> # Usually this is done when initializing the SymbolLibrary as SymbolLibrary(preamble=preamble)
>>> executable_fun = expr_to_error_function(tree, symbol_library)
>>> print(float(executable_fun(np.array([[1], [2], [3], [4]]), np.array([]), np.array([2, 3, 4, 5]))))
0.0
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expr
|
Union[List[str], Node]
|
Expression as a token list in infix notation or a Node tree. |
required |
symbol_library
|
SymbolLibrary
|
Defines token semantics (NumPy function strings, preamble imports). Defaults to SymbolLibrary.default_symbols. |
default_symbols()
|
Returns:
| Type | Description |
|---|---|
Callable[[ndarray, ndarray, ndarray], float]
|
A callable |
Raises:
| Type | Description |
|---|---|
Exception
|
If |
Source code in SRToolkit/utils/expression_compiler.py
expr_to_executable_function
expr_to_executable_function(
expr: Union[List[str], Node],
symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
) -> Callable[[np.ndarray, Optional[np.ndarray]], np.ndarray]
Compile an expression into an executable Python function.
The returned callable evaluates the expression over a batch of inputs and a vector
of constant values. To use a backend other than NumPy, set
symbol_library.preamble to the required import statements.
Examples:
>>> executable_fun = expr_to_executable_function(["X_0", "+", "1"])
>>> executable_fun(np.array([[1], [2], [3], [4]]), np.array([]))
array([2, 3, 4, 5])
>>> executable_fun = expr_to_executable_function(["pi"])
>>> executable_fun(np.array([[1], [2], [3], [4]]), np.array([1]))
array([3.14159265, 3.14159265, 3.14159265, 3.14159265])
>>> executable_fun = expr_to_executable_function(["C"])
>>> executable_fun(np.array([[1], [2], [3], [4]]), np.array([1]))
array([1, 1, 1, 1])
>>> tree = tokens_to_tree(["X_0", "+", "1"], SymbolLibrary.default_symbols(1))
>>> executable_fun = expr_to_executable_function(tree)
>>> executable_fun(np.array([[1], [2], [3], [4]]), np.array([]))
array([2, 3, 4, 5])
>>> # In case you need libraries other than numpy for the evaluation of your expressions,
>>> # you can add them to the preamble in the SymbolLibrary. Here is how a preamble would look like:
>>> symbol_library = SymbolLibrary.default_symbols(1)
>>> symbol_library.preamble = ["import numpy as np"]
>>> # Usually this is done when initializing the SymbolLibrary as SymbolLibrary(preamble=preamble)
>>> executable_fun = expr_to_executable_function(tree, symbol_library)
>>> executable_fun(np.array([[1], [2], [3], [4]]), np.array([]))
array([2, 3, 4, 5])
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expr
|
Union[List[str], Node]
|
Expression as a token list in infix notation or a Node tree. |
required |
symbol_library
|
SymbolLibrary
|
Defines token semantics (NumPy function strings, preamble imports). Defaults to SymbolLibrary.default_symbols. |
default_symbols()
|
Returns:
| Type | Description |
|---|---|
Callable[[ndarray, Optional[ndarray]], ndarray]
|
A callable |
Raises:
| Type | Description |
|---|---|
Exception
|
If |
Source code in SRToolkit/utils/expression_compiler.py
tree_to_function_rec
tree_to_function_rec(
tree: Node,
symbol_library: SymbolLibrary,
var_counter: int = 0,
const_counter: int = 0,
) -> Tuple[List[str], str, int, int]
Recursively convert a parse tree into lines of Python code for expression evaluation.
This is a low-level helper for expr_to_executable_function and expr_to_error_function. Call those functions directly unless you need fine-grained control over code generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tree
|
Node
|
Root of the subtree to convert. |
required |
symbol_library
|
SymbolLibrary
|
Provides NumPy function strings for each token. |
required |
var_counter
|
int
|
Running count of intermediate variables, used to generate unique
names. Default |
0
|
const_counter
|
int
|
Running count of constants consumed; used to index into the |
0
|
Returns:
| Type | Description |
|---|---|
Tuple[List[str], str, int, int]
|
A 4-tuple |
Raises:
| Type | Description |
|---|---|
Exception
|
If the tree contains a token that is neither a recognized symbol nor a numeric literal. |
Source code in SRToolkit/utils/expression_compiler.py
create_generic_pcfg
Construct a generic Probabilistic Context-Free Grammar (PCFG) from a symbol library.
The grammar encodes standard mathematical operator precedence through a fixed non-terminal hierarchy:
E— additive level (precedence 0 operators)F— multiplicative level (precedence 1 operators)B— power level (precedence 2 operators)T— terminal: function application (R), constant (C), or variable (V)R— unary functions (precedence 5) and parenthesised sub-expressionsP— postfix functions (precedence -1, e.g.^2)
The returned string is in NLTK PCFG format and can be passed directly to generate_from_pcfg or generate_n_expressions.
Examples:
>>> sl = SymbolLibrary.from_symbol_list(["+", "-", "*", "sin", "^2", "pi"], 2)
>>> print(create_generic_pcfg(sl))
E -> E '+' F [0.2]
E -> E '-' F [0.2]
E -> F [0.6]
F -> F '*' B [0.4]
F -> B [0.6]
B -> T [1.0]
T -> R [0.2]
T -> C [0.2]
T -> V [0.6]
C -> 'pi' [1.0]
R -> 'sin' '(' E ')' [0.4]
R -> P [0.15]
R -> '(' E ')' [0.45]
P -> '(' E ')' '^2' [1.0]
V -> 'X_0' [0.5]
V -> 'X_1' [0.5]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol_library
|
SymbolLibrary
|
Symbol library defining the available tokens, their types, and precedences. |
required |
Returns:
| Type | Description |
|---|---|
str
|
NLTK-formatted PCFG string with generic probabilities. |
Source code in SRToolkit/utils/expression_generator.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
generate_from_pcfg
generate_from_pcfg(
grammar_str: str,
start_symbol: str = "E",
max_depth: int = 40,
limit: int = 100,
) -> List[str]
Sample a single expression from a PCFG by Monte-Carlo tree expansion.
Examples:
>>> generate_from_pcfg("E -> '1' [1.0]")
['1']
>>> grammar = create_generic_pcfg(SymbolLibrary.default_symbols())
>>> len(generate_from_pcfg(grammar)) > 0
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grammar_str
|
str
|
Grammar in NLTK PCFG notation. |
required |
start_symbol
|
str
|
Non-terminal from which expansion begins. Default |
'E'
|
max_depth
|
int
|
Maximum parse-tree depth. Values below |
40
|
limit
|
int
|
Maximum number of sampling attempts before raising an exception.
Default |
100
|
Returns:
| Type | Description |
|---|---|
List[str]
|
A single expression as a list of string tokens in infix notation. |
Raises:
| Type | Description |
|---|---|
Exception
|
If a valid expression cannot be produced within |
Source code in SRToolkit/utils/expression_generator.py
generate_n_expressions
generate_n_expressions(
expression_description: Union[str, SymbolLibrary],
num_expressions: int,
unique: bool = True,
max_expression_length: int = 50,
verbose: bool = True,
) -> List[List[str]]
Sample num_expressions expressions from a grammar or symbol library.
Examples:
>>> len(generate_n_expressions(SymbolLibrary.default_symbols(5), 100, verbose=False))
100
>>> generate_n_expressions(SymbolLibrary.from_symbol_list([], 1), 3, unique=False, verbose=False, max_expression_length=1)
[['X_0'], ['X_0'], ['X_0']]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expression_description
|
Union[str, SymbolLibrary]
|
Grammar source as a NLTK PCFG string or a SymbolLibrary (a generic PCFG is built automatically via create_generic_pcfg). |
required |
num_expressions
|
int
|
Number of expressions to generate. |
required |
unique
|
bool
|
If |
True
|
max_expression_length
|
int
|
Maximum token count per expression. Values below |
50
|
verbose
|
bool
|
Display a progress bar. Default |
True
|
Returns:
| Type | Description |
|---|---|
List[List[str]]
|
List of expressions, each represented as a list of string tokens in infix notation. |
Source code in SRToolkit/utils/expression_generator.py
simplify
simplify(
expr: Union[List[str], Node],
symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
) -> Union[List[str], Node]
Simplify an expression algebraically.
Two successive steps are applied:
- SymPy simplification — expands and reduces the expression algebraically
(e.g.
X_0 * X_1 / X_0→X_1). - Constant folding — collapses any sub-expression containing no variables
into a single free constant
C(e.g.C * C + C→C).
Examples:
>>> expr = ["C", "+", "C", "*", "C", "+", "X_0", "*", "X_1", "/", "X_0"]
>>> print("".join(simplify(expr)))
C+X_1
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expr
|
Union[List[str], Node]
|
Expression as a token list in infix notation or a Node tree. |
required |
symbol_library
|
SymbolLibrary
|
Symbol library defining variables and constants. Defaults to SymbolLibrary.default_symbols. |
default_symbols()
|
Returns:
| Type | Description |
|---|---|
Union[List[str], Node]
|
The simplified expression in the same form as the input (list if a list was given, Node if a tree was given). |
Raises:
| Type | Description |
|---|---|
Exception
|
If simplification fails or the result contains tokens absent from
|
Source code in SRToolkit/utils/expression_simplifier.py
expr_to_latex
Convert an expression to a LaTeX string.
Examples:
>>> expr_to_latex(["(", "X_0", "+", "X_1", ")"], SymbolLibrary.default_symbols())
'$X_{0} + X_{1}$'
>>> expr = Node("+", Node("X_0"), Node("1"))
>>> expr_to_latex(expr, SymbolLibrary.default_symbols())
'$1 + X_{0}$'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expr
|
Union[Node, List[str]]
|
Expression as a token list or a Node tree. |
required |
symbol_library
|
SymbolLibrary
|
Symbol library providing LaTeX templates. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A LaTeX string of the form |
Source code in SRToolkit/utils/expression_tree.py
is_float
Return True if element can be interpreted as a floating-point number.
Examples:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
Any
|
Value to test. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
Source code in SRToolkit/utils/expression_tree.py
tokens_to_tree
Parse a token list into an expression tree using the shunting-yard algorithm.
Examples:
>>> tree = tokens_to_tree(["(", "X_0", "+", "X_1", ")"], SymbolLibrary.default_symbols())
>>> len(tree)
3
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens
|
List[str]
|
Token list in infix notation. |
required |
sl
|
SymbolLibrary
|
Symbol library used to resolve token types and precedences. |
required |
Returns:
| Type | Description |
|---|---|
Node
|
Root Node of the parsed expression tree. |
Raises:
| Type | Description |
|---|---|
Exception
|
If a token is absent from |
Source code in SRToolkit/utils/expression_tree.py
bed
bed(
expr1: Union[Node, List[str], ndarray],
expr2: Union[Node, List[str], ndarray],
X: Optional[ndarray] = None,
num_consts_sampled: int = 32,
num_points_sampled: int = 64,
domain_bounds: Optional[List[Tuple[float, float]]] = None,
consts_bounds: Tuple[float, float] = (-5, 5),
symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
seed: Optional[int] = None,
) -> float
Compute the Behavior-aware Expression Distance (BED) between two expressions.
BED measures how similarly two expressions behave over a domain by comparing their output distributions point-by-point using the Wasserstein distance. Free constants are marginalised by sampling multiple constant vectors via Latin Hypercube Sampling.
Either X or domain_bounds must be provided when expressions are given as
token lists or Node trees. Pre-computed behavior matrices can be passed
directly to avoid redundant evaluation.
Examples:
>>> X = np.random.rand(10, 2) - 0.5
>>> expr1 = ["X_0", "+", "C"] # instances of SRToolkit.utils.expression_tree.Node work as well
>>> expr2 = ["X_1", "+", "C"]
>>> bed(expr1, expr2, X) < 1
True
>>> # Changing the number of sampled constants
>>> bed(expr1, expr2, X, num_consts_sampled=128, consts_bounds=(-2, 2)) < 1
True
>>> # Sampling X instead of giving it directly by defining a domain
>>> bed(expr1, expr2, domain_bounds=[(0, 1), (0, 1)]) < 1
True
>>> bed(expr1, expr2, domain_bounds=[(0, 1), (0, 1)], num_points_sampled=128) < 1
True
>>> # You can use behavior matrices instead of expressions (this has potential computational advantages if same expression is used multiple times)
>>> bm1 = create_behavior_matrix(expr1, X)
>>> bed(bm1, expr2, X) < 1
True
>>> bm2 = create_behavior_matrix(expr2, X)
>>> bed(bm1, bm2) < 1
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expr1
|
Union[Node, List[str], ndarray]
|
First expression as a token list, a Node tree, or a pre-computed
behavior matrix of shape |
required |
expr2
|
Union[Node, List[str], ndarray]
|
Second expression in the same format as |
required |
X
|
Optional[ndarray]
|
Evaluation points of shape |
None
|
num_consts_sampled
|
int
|
Number of constant vectors sampled per expression. Default |
32
|
num_points_sampled
|
int
|
Number of points sampled from |
64
|
domain_bounds
|
Optional[List[Tuple[float, float]]]
|
Per-variable |
None
|
consts_bounds
|
Tuple[float, float]
|
|
(-5, 5)
|
symbol_library
|
SymbolLibrary
|
Symbol library used to compile expressions. Defaults to SymbolLibrary.default_symbols. |
default_symbols()
|
seed
|
Optional[int]
|
Random seed for reproducible sampling. Default |
None
|
Returns:
| Type | Description |
|---|---|
float
|
BED between the expressions, given as a float. |
Raises:
| Type | Description |
|---|---|
Exception
|
If |
Source code in SRToolkit/utils/measures.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 | |
create_behavior_matrix
create_behavior_matrix(
expr: Union[Node, List[str]],
X: ndarray,
num_consts_sampled: int = 32,
consts_bounds: Tuple[float, float] = (-5, 5),
symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
seed: Optional[int] = None,
) -> np.ndarray
Evaluate an expression over multiple constant samples to produce a behavior matrix.
For expressions with free constants, constants are drawn via Latin Hypercube Sampling
within consts_bounds. For constant-free expressions, all columns are identical.
Examples:
>>> X = np.random.rand(10, 2) - 0.5
>>> create_behavior_matrix(["X_0", "+", "C"], X, num_consts_sampled=32).shape
(10, 32)
>>> mean_0_1 = np.mean(create_behavior_matrix(["X_0", "+", "C"], X, num_consts_sampled=32, consts_bounds=(0, 1)))
>>> mean_1_5 = np.mean(create_behavior_matrix(["X_0", "+", "C"], X, num_consts_sampled=32, consts_bounds=(1, 5)))
>>> print(bool(mean_0_1 < mean_1_5))
True
>>> # Deterministic expressions always produce the same behavior matrix
>>> bm1 = create_behavior_matrix(["X_0", "+", "X_1"], X)
>>> bm2 = create_behavior_matrix(["X_0", "+", "X_1"], X)
>>> print(bool(np.array_equal(bm1, bm2)))
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expr
|
Union[Node, List[str]]
|
Expression as a token list or a Node tree. |
required |
X
|
ndarray
|
Input data of shape |
required |
num_consts_sampled
|
int
|
Number of constant vectors to sample; sets the number of
output columns. Default |
32
|
consts_bounds
|
Tuple[float, float]
|
|
(-5, 5)
|
symbol_library
|
SymbolLibrary
|
Symbol library used to compile the expression. Defaults to SymbolLibrary.default_symbols. |
default_symbols()
|
seed
|
Optional[int]
|
Random seed for reproducible constant sampling. Default |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Behavior matrix of shape |
Raises:
| Type | Description |
|---|---|
Exception
|
If |
Source code in SRToolkit/utils/measures.py
edit_distance
edit_distance(
expr1: Union[List[str], Node],
expr2: Union[List[str], Node],
notation: str = "postfix",
symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
) -> int
Compute the edit distance between two expressions.
Both expressions are normalised to the requested notation before computing Levenshtein distance, making the result independent of input serialisation.
Examples:
>>> edit_distance(["X_0", "+", "1"], ["X_0", "+", "1"])
0
>>> edit_distance(["X_0", "+", "1"], ["X_0", "-", "1"])
1
>>> edit_distance(tokens_to_tree(["X_0", "+", "1"], SymbolLibrary.default_symbols(1)), tokens_to_tree(["X_0", "-", "1"], SymbolLibrary.default_symbols(1)))
1
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expr1
|
Union[List[str], Node]
|
First expression as a token list or a Node tree. |
required |
expr2
|
Union[List[str], Node]
|
Second expression as a token list or a Node tree. |
required |
notation
|
str
|
Notation used for comparison: |
'postfix'
|
symbol_library
|
SymbolLibrary
|
Symbol library used when converting expressions to the target notation. Defaults to SymbolLibrary.default_symbols. |
default_symbols()
|
Returns:
| Type | Description |
|---|---|
int
|
Integer edit distance between the two serialised expressions. |
Source code in SRToolkit/utils/measures.py
tree_edit_distance
tree_edit_distance(
expr1: Union[Node, List[str]],
expr2: Union[Node, List[str]],
symbol_library: SymbolLibrary = SymbolLibrary.default_symbols(),
) -> int
Compute the Zhang-Shasha tree edit distance between two expressions.
Examples:
>>> tree_edit_distance(["X_0", "+", "1"], ["X_0", "+", "1"])
0
>>> tree_edit_distance(["X_0", "+", "1"], ["X_0", "-", "1"])
1
>>> tree_edit_distance(tokens_to_tree(["X_0", "+", "1"], SymbolLibrary.default_symbols(1)), tokens_to_tree(["X_0", "-", "1"], SymbolLibrary.default_symbols(1)))
1
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expr1
|
Union[Node, List[str]]
|
First expression as a token list or a Node tree. |
required |
expr2
|
Union[Node, List[str]]
|
Second expression as a token list or a Node tree. |
required |
symbol_library
|
SymbolLibrary
|
Symbol library used when converting token lists to trees. Defaults to SymbolLibrary.default_symbols. |
default_symbols()
|
Returns:
| Type | Description |
|---|---|
int
|
Integer tree edit distance. |