Skip to content

Expression Manipulation

Expressions in SRToolkit are represented as infix token lists — plain Python lists of strings. This format is the common currency between the symbol library, the parser, the compiler, and the SR approaches.

Token lists

A token list is an ordered sequence of strings in infix notation:

expr = ["X_0", "+", "C", "*", "sin", "(", "X_1", ")"]

Token types:

Type Examples Description
op + - * / ^ Binary operators
fn sin cos exp sqrt ln ^2 ^3 Unary functions and postfix powers
var X_0 X_1 Input variables, mapped to columns of X in order
const C Free constant optimised during parameter estimation
lit pi e Fixed numeric literals

Postfix power tokens (^2, ^3, ^4, ^5, ^-1) are written after their operand:

["X_0", "^2", "+", "X_1", "^3"]   # x0² + x1³

Symbol library

A SymbolLibrary defines which tokens are valid in an expression and how they compile to NumPy. Most use cases are covered by the two factory methods:

from SRToolkit.utils import SymbolLibrary

# Full default set (all operators, functions, C, pi, e) with 2 variables
sl = SymbolLibrary.default_symbols(num_variables=2)

# Restrict to a specific subset
sl = SymbolLibrary.from_symbol_list(
    ["+", "-", "*", "/", "sin", "cos", "exp", "sqrt", "^2", "C"],
    num_variables=3,
)

Custom symbols can be added for non-standard backends:

sl = SymbolLibrary(preamble=["import numpy as np", "import scipy.special as sp"])
sl.add_symbol("erf", "fn", precedence=5, np_fn="sp.erf({})", latex_str=r"\mathrm{erf}\,{}")

Symbol library context

Passing sl to every function call is verbose. Two alternatives let you set it once.

Context manager — active for the duration of the with block:

sl = SymbolLibrary.default_symbols(num_variables=2)

with sl:
    tree  = tokens_to_tree(["X_0", "+", "X_1", "*", "C"])
    f     = compile_expr(["X_0", "*", "C"])
    latex = expr_to_latex(["sin", "(", "X_0", ")", "+", "X_1"])
    d     = edit_distance(["X_0", "+", "1"], ["X_0", "-", "1"])

Module-level default — persists for the whole session, useful in scripts and notebooks:

SymbolLibrary.set_default(SymbolLibrary.default_symbols(num_variables=3))

# No sl argument needed anywhere below this point
tree = tokens_to_tree(["X_0", "+", "X_1"])
f    = compile_expr(["X_0", "*", "C"])

SymbolLibrary.set_default(None)   # clear when done

The resolution order is: explicit argument → context manager → module default → default_symbols(). Functions that parse token vocabularies (tokens_to_tree, expr_to_latex) raise RuntimeError if no library is available; all other functions fall back to default_symbols().

Expression trees

tokens_to_tree parses a token list into a binary Node tree:

from SRToolkit.utils import SymbolLibrary, tokens_to_tree

sl = SymbolLibrary.default_symbols(num_variables=2)
tree = tokens_to_tree(["X_0", "+", "X_1", "*", "C"], sl)

Convert back to a token list in any notation:

tree.to_list(sl, notation="infix")    # ['X_0', '+', 'X_1', '*', 'C']
tree.to_list(notation="prefix")       # ['+', 'X_0', '*', 'X_1', 'C']
tree.to_list(notation="postfix")      # ['X_0', 'X_1', 'C', '*', '+']

Executable functions

compile_expr compiles an expression into a fast callable f(X, C):

import numpy as np
from SRToolkit.utils import compile_expr

f = compile_expr(["X_0", "*", "C", "+", "X_1"])

X = np.array([[1.0, 2.0],
              [3.0, 4.0],
              [5.0, 6.0]])
C = np.array([2.0])       # one free constant

print(f(X, C))            # [4.  10.  16.]

X must have shape (n_samples, n_features). C is a 1-D array with one entry per C token in the expression. Pass an empty array (np.array([])) when there are no constants.

LaTeX rendering

from SRToolkit.utils import SymbolLibrary, expr_to_latex

sl = SymbolLibrary.default_symbols(num_variables=2)
latex = expr_to_latex(["sin", "(", "X_0", ")", "+", "X_1", "^2"], sl)
print(latex)  # $\sin X_{0} + X_{1}^2$

Simplification

simplify applies algebraic simplification followed by constant folding:

from SRToolkit.utils import SymbolLibrary
from SRToolkit.utils.expression_simplifier import simplify

sl = SymbolLibrary.default_symbols(num_variables=2)

# Algebraic reduction + constant folding
simplified = simplify(["C", "+", "C", "*", "C", "+", "X_0", "*", "X_1", "/", "X_0"], sl)
print(simplified)  # ['C', '+', 'X_1']

Note

Simplification requires SymPy. It may fail for expressions containing tokens outside the default symbol set — wrap calls in a try/except when batch-processing large result sets.