Skip to content

Expression Compiler

SRToolkit.utils.expression_compiler

Functions for compiling symbolic expressions into executable Python callables.

The primary interface are two functions:

Both accept a backend parameter that selects the evaluation engine:

  • "stack" (default): postfix stack-machine evaluator backed by Cython; falls back to pure-Python automatically when the compiled extension is unavailable.
  • "codegen": generates Python/NumPy source code via exec(). Compatible with any custom symbol library and requires no compiled extensions.
  • "stack_py": pure-Python stack-machine evaluator; avoids the Cython→Python boundary overhead when the library contains many custom symbols.

The backend parameter selects the engine; lower-level per-backend functions are internal and prefixed with _.

Use "stack" (the default) for best performance in most cases. Use "stack_py" when the library contains many custom symbols to avoid Cython→Python call overhead per instruction. Use "codegen" when evaluating on large datasets, as it generates a single vectorised NumPy function with less per-instruction dispatch overhead.

compile_expr

compile_expr(expr: Union[List[str], Node], symbol_library: Optional[SymbolLibrary] = None, backend: str = 'stack') -> Callable[[np.ndarray, Optional[np.ndarray]], np.ndarray]

Compile an expression into a callable f(X, C) → np.ndarray.

Examples:

>>> f = compile_expr(["X_0", "+", "1"])
>>> f(np.array([[1.0], [2.0], [3.0]]), None)
array([2., 3., 4.])
>>> f = compile_expr(["X_0", "+", "1"], backend="codegen")
>>> f(np.array([[1], [2], [3]]), np.array([]))
array([2, 3, 4])

Parameters:

Name Type Description Default
expr Union[List[str], Node]

Expression as a token list in infix notation or a Node tree.

required
symbol_library Optional[SymbolLibrary]

Symbol library used to look up token types. Defaults to SymbolLibrary.default_symbols.

None
backend str

Evaluation backend. One of:

  • "stack" (default): postfix stack-machine evaluator backed by Cython; falls back to pure-Python when the compiled extension is unavailable.
  • "codegen": generates Python/NumPy source via exec(); compatible with any custom symbol library.
  • "stack_py": pure-Python stack-machine evaluator; avoids the Cython→Python boundary overhead when the library contains many custom symbols.
'stack'

Returns:

Type Description
Callable[[ndarray, Optional[ndarray]], ndarray]

A callable f(X, C) where X is a 2-D array of shape

Callable[[ndarray, Optional[ndarray]], ndarray]

(n_samples, n_features) and C is a 1-D array of constant values

Callable[[ndarray, Optional[ndarray]], ndarray]

(pass None or an empty array for constant-free expressions).

Callable[[ndarray, Optional[ndarray]], ndarray]

Returns a 1-D output array of shape (n_samples,).

Raises:

Type Description
ValueError

If backend is not one of the supported values.

Source code in SRToolkit/utils/expression_compiler.py
def compile_expr(
    expr: Union[List[str], Node],
    symbol_library: Optional[SymbolLibrary] = None,
    backend: str = "stack",
) -> Callable[[np.ndarray, Optional[np.ndarray]], np.ndarray]:
    """
    Compile an expression into a callable ``f(X, C) → np.ndarray``.

    Examples:
        >>> f = compile_expr(["X_0", "+", "1"])
        >>> f(np.array([[1.0], [2.0], [3.0]]), None)
        array([2., 3., 4.])
        >>> f = compile_expr(["X_0", "+", "1"], backend="codegen")
        >>> f(np.array([[1], [2], [3]]), np.array([]))
        array([2, 3, 4])

    Args:
        expr: Expression as a token list in infix notation or a
            [Node][SRToolkit.utils.expression_tree.Node] tree.
        symbol_library: Symbol library used to look up token types.
            Defaults to [SymbolLibrary.default_symbols][SRToolkit.utils.symbol_library.SymbolLibrary.default_symbols].
        backend: Evaluation backend. One of:

            - ``"stack"`` (default): postfix stack-machine evaluator backed by
              Cython; falls back to pure-Python when the compiled extension is
              unavailable.
            - ``"codegen"``: generates Python/NumPy source via ``exec()``;
              compatible with any custom symbol library.
            - ``"stack_py"``: pure-Python stack-machine evaluator; avoids the
              Cython→Python boundary overhead when the library contains many
              custom symbols.

    Returns:
        A callable ``f(X, C)`` where ``X`` is a 2-D array of shape
        ``(n_samples, n_features)`` and ``C`` is a 1-D array of constant values
        (pass ``None`` or an empty array for constant-free expressions).
        Returns a 1-D output array of shape ``(n_samples,)``.

    Raises:
        ValueError: If ``backend`` is not one of the supported values.
    """
    if symbol_library is None:
        symbol_library = SymbolLibrary.get_or_default()
    if backend == "stack":
        return _expr_to_cython_callable(expr, symbol_library)
    elif backend == "codegen":
        return _expr_to_executable_function(expr, symbol_library)
    elif backend == "stack_py":
        return _expr_to_python_callable(expr, symbol_library)
    else:
        raise ValueError(f"Unknown backend '{backend}'. Must be one of: 'stack', 'codegen', 'stack_py'.")

compile_expr_rmse

compile_expr_rmse(expr: Union[List[str], Node], symbol_library: Optional[SymbolLibrary] = None, backend: str = 'stack', X: Optional[ndarray] = None) -> Callable[[np.ndarray, np.ndarray, np.ndarray], float]

Compile an expression into an RMSE callable f(X, C, y) → float.

Examples:

>>> f = compile_expr_rmse(["X_0", "+", "1"])
>>> f(np.array([[1.0], [2.0], [3.0]]), np.array([]), np.array([2.0, 3.0, 4.0]))
0.0
>>> f = compile_expr_rmse(["X_0", "+", "1"], backend="codegen")
>>> print(float(f(np.array([[1], [2], [3]]), np.array([]), np.array([2, 3, 4]))))
0.0

Parameters:

Name Type Description Default
expr Union[List[str], Node]

Expression as a token list in infix notation or a Node tree.

required
symbol_library Optional[SymbolLibrary]

Symbol library used to look up token types. Defaults to SymbolLibrary.default_symbols.

None
backend str

Evaluation backend. One of:

  • "stack" (default): postfix stack-machine evaluator backed by Cython; RMSE is computed in C without an intermediate output array. Falls back to pure-Python when the compiled extension is unavailable.
  • "codegen": generates Python/NumPy source via exec(); compatible with any custom symbol library.
  • "stack_py": pure-Python stack-machine evaluator; avoids the Cython→Python boundary overhead when the library contains many custom symbols.
'stack'
X Optional[ndarray]

Optional input data of shape (n_samples, n_features). When provided with backend="stack" or backend="stack_py", all constant-free subtrees are pre-evaluated against X at compile time — a significant speedup when the same X is reused across many calls with varying C (e.g. inside an optimiser loop). Ignored for backend="codegen".

None

Returns:

Type Description
Callable[[ndarray, ndarray, ndarray], float]

A callable f(X, C, y) returning the scalar RMSE as a float.

Raises:

Type Description
ValueError

If backend is not one of the supported values.

Source code in SRToolkit/utils/expression_compiler.py
def compile_expr_rmse(
    expr: Union[List[str], Node],
    symbol_library: Optional[SymbolLibrary] = None,
    backend: str = "stack",
    X: Optional[np.ndarray] = None,
) -> Callable[[np.ndarray, np.ndarray, np.ndarray], float]:
    """
    Compile an expression into an RMSE callable ``f(X, C, y) → float``.

    Examples:
        >>> f = compile_expr_rmse(["X_0", "+", "1"])
        >>> f(np.array([[1.0], [2.0], [3.0]]), np.array([]), np.array([2.0, 3.0, 4.0]))
        0.0
        >>> f = compile_expr_rmse(["X_0", "+", "1"], backend="codegen")
        >>> print(float(f(np.array([[1], [2], [3]]), np.array([]), np.array([2, 3, 4]))))
        0.0

    Args:
        expr: Expression as a token list in infix notation or a
            [Node][SRToolkit.utils.expression_tree.Node] tree.
        symbol_library: Symbol library used to look up token types.
            Defaults to [SymbolLibrary.default_symbols][SRToolkit.utils.symbol_library.SymbolLibrary.default_symbols].
        backend: Evaluation backend. One of:

            - ``"stack"`` (default): postfix stack-machine evaluator backed by
              Cython; RMSE is computed in C without an intermediate output array.
              Falls back to pure-Python when the compiled extension is unavailable.
            - ``"codegen"``: generates Python/NumPy source via ``exec()``;
              compatible with any custom symbol library.
            - ``"stack_py"``: pure-Python stack-machine evaluator; avoids the
              Cython→Python boundary overhead when the library contains many
              custom symbols.

        X: Optional input data of shape ``(n_samples, n_features)``. When provided
            with ``backend="stack"`` or ``backend="stack_py"``, all constant-free
            subtrees are pre-evaluated against *X* at compile time — a significant
            speedup when the same *X* is reused across many calls with varying *C*
            (e.g. inside an optimiser loop). Ignored for ``backend="codegen"``.

    Returns:
        A callable ``f(X, C, y)`` returning the scalar RMSE as a float.

    Raises:
        ValueError: If ``backend`` is not one of the supported values.
    """
    if symbol_library is None:
        symbol_library = SymbolLibrary.get_or_default()
    if backend == "stack":
        return _expr_to_cython_error_callable(expr, symbol_library, X)
    elif backend == "codegen":
        return _expr_to_error_function(expr, symbol_library)
    elif backend == "stack_py":
        return _expr_to_python_stack_error_callable(expr, symbol_library, X)
    else:
        raise ValueError(f"Unknown backend '{backend}'. Must be one of: 'stack', 'codegen', 'stack_py'.")