Grammar

SRToolkit.utils.grammar

CFG/PCFG grammar representation, constraint protocol, and stateful derivation engine.

Provides

Rule, ParseTreeNode, ParseTree, Grammar, Constraint, AncestorInfo, ExpansionContext, MaxDepth, MaxNodes, MaxOccurrences, NoNested, DimensionalConsistency, Derivation.

AncestorInfo `dataclass`

AncestorInfo(nonterminal: str, rule: Rule, child_index: int)

One entry in an ExpansionContext's ancestor chain.

Attributes:

Name	Type	Description
`nonterminal`	`str`	The non-terminal symbol at this ancestor position.
`rule`	`Rule`	The rule applied at this position.
`child_index`	`int`	The index within `rule.rhs` that leads toward the current slot, counting only non-terminal positions.

Constraint

Bases: Generic[L, G]

Base class for derivation constraints (hard filters).

Subclass and override the methods you need. Constraints carry only construction-time configuration; all per-derivation state is managed by the engine and threaded through method arguments.

Scoping — set nonterminals and/or rule_names to restrict allows to a subset of slots or rules. A scope miss is treated as implicit acceptance. update is always called regardless of scope so that global counters stay accurate.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([Rule("E", ["x"]), Rule("E", ["y"])])
>>> class RejectY(Constraint):
...     def allows(self, slot, rule, global_): return rule.rhs != ["y"]
>>> g.add_constraint(RejectY())
>>> d = g.start_derivation("E")
>>> [r.rhs for r in d.options()]
[['x']]

to_dict

to_dict() -> dict

Serialise this constraint to a JSON-safe dictionary.

The base implementation stores only the fully-qualified class path under constraint_class. Subclasses should call super().to_dict() and add their own constructor arguments so that from_dict can reconstruct the instance faithfully. See from_dict for an example.

Returns:

Type	Description
`dict`	Dictionary with at least the key `constraint_class`.

Source code in SRToolkit/utils/grammar/constraints.py

def to_dict(self) -> dict:
    """
    Serialise this constraint to a JSON-safe dictionary.

    The base implementation stores only the fully-qualified class path under
    ``constraint_class``. Subclasses should call ``super().to_dict()`` and
    add their own constructor arguments so that ``from_dict`` can reconstruct
    the instance faithfully. See
    [from_dict][SRToolkit.utils.grammar.constraints.Constraint.from_dict]
    for an example.

    Returns:
        Dictionary with at least the key ``constraint_class``.
    """
    return {"constraint_class": f"{self.__class__.__module__}.{self.__class__.__qualname__}"}

from_dict `classmethod`

from_dict(d: dict) -> 'Constraint'

Reconstruct a constraint from a dictionary produced by to_dict.

When called on the base Constraint class, dispatches to the correct subclass via the constraint_class key using importlib — both built-in and user-defined subclasses are supported. When called on a concrete subclass, the subclass must override this method.

The dictionary must contain at minimum the key constraint_class, whose value is the fully-qualified class path (e.g. "mymodule.MyConstraint"). Any additional keys are forwarded to the subclass override.

To make a custom subclass serialisable, override both to_dict and from_dict::

class MyConstraint(Constraint):
    def __init__(self, threshold: float) -> None:
        self.threshold = threshold

    def to_dict(self) -> dict:
        return {**super().to_dict(), "threshold": self.threshold}

    @classmethod
    def from_dict(cls, d: dict) -> "MyConstraint":
        return cls(d["threshold"])

Examples:

>>> from SRToolkit.utils.grammar import Constraint, MaxDepth
>>> c = Constraint.from_dict(MaxDepth(5).to_dict())
>>> isinstance(c, MaxDepth) and c.limit == 5
True

Parameters:

Name	Type	Description	Default
`d`	`dict`	Dictionary previously returned by `to_dict`.	required

Returns:

Type	Description
`'Constraint'`	A reconstructed Constraint instance.

Raises:

Type	Description
`KeyError`	If `constraint_class` is missing from `d` (dispatch path).
`ImportError`	If the module cannot be imported (dispatch path).
`AttributeError`	If the class cannot be found in the module (dispatch path).
`NotImplementedError`	If called on a subclass that has not overridden this method.

Source code in SRToolkit/utils/grammar/constraints.py

@classmethod
def from_dict(cls, d: dict) -> "Constraint":
    """
    Reconstruct a constraint from a dictionary produced by ``to_dict``.

    When called on the base [Constraint][SRToolkit.utils.grammar.constraints.Constraint]
    class, dispatches to the correct subclass via the ``constraint_class`` key using
    `importlib` — both built-in and user-defined subclasses are supported.
    When called on a concrete subclass, the subclass must override this method.

    The dictionary must contain at minimum the key ``constraint_class``, whose value
    is the fully-qualified class path (e.g. ``"mymodule.MyConstraint"``).  Any
    additional keys are forwarded to the subclass override.

    To make a custom subclass serialisable, override both ``to_dict`` and
    ``from_dict``::

        class MyConstraint(Constraint):
            def __init__(self, threshold: float) -> None:
                self.threshold = threshold

            def to_dict(self) -> dict:
                return {**super().to_dict(), "threshold": self.threshold}

            @classmethod
            def from_dict(cls, d: dict) -> "MyConstraint":
                return cls(d["threshold"])

    Examples:
        >>> from SRToolkit.utils.grammar import Constraint, MaxDepth
        >>> c = Constraint.from_dict(MaxDepth(5).to_dict())
        >>> isinstance(c, MaxDepth) and c.limit == 5
        True

    Args:
        d: Dictionary previously returned by ``to_dict``.

    Returns:
        A reconstructed [Constraint][SRToolkit.utils.grammar.constraints.Constraint] instance.

    Raises:
        KeyError: If ``constraint_class`` is missing from ``d`` (dispatch path).
        ImportError: If the module cannot be imported (dispatch path).
        AttributeError: If the class cannot be found in the module (dispatch path).
        NotImplementedError: If called on a subclass that has not overridden this method.
    """
    if cls is Constraint:
        class_path = d["constraint_class"]
        module_path, cls_name = class_path.rsplit(".", 1)
        resolved = getattr(importlib.import_module(module_path), cls_name)
        return resolved.from_dict(d)
    raise NotImplementedError(f"{cls.__name__}.from_dict is not implemented.")

initial_local

initial_local(start: str) -> L

Return the initial local state for the constraint.

Source code in SRToolkit/utils/grammar/constraints.py

def initial_local(self, start: str) -> L:
    """Return the initial local state for the constraint."""
    return None  # type: ignore[return-value]

initial_global

initial_global() -> G

Return the initial global state for the constraint.

Source code in SRToolkit/utils/grammar/constraints.py

def initial_global(self) -> G:
    """Return the initial global state for the constraint."""
    return None  # type: ignore[return-value]

allows

allows(slot: ExpansionContext[L], rule: Rule, global_: G) -> bool

Return True if rule may be applied at slot.

Called only when the slot's non-terminal and rule name are within scope.

Parameters:

Name	Type	Description	Default
`slot`	`ExpansionContext[L]`	Current derivation position with this constraint's local state.	required
`rule`	`Rule`	Candidate production rule.	required
`global_`	`G`	Current per-derivation global state.	required

Returns:

Type	Description
`bool`	`True` to keep the rule in the candidate set.

Source code in SRToolkit/utils/grammar/constraints.py

def allows(self, slot: ExpansionContext[L], rule: Rule, global_: G) -> bool:
    """
    Return ``True`` if ``rule`` may be applied at ``slot``.

    Called only when the slot's non-terminal and rule name are within scope.

    Args:
        slot: Current derivation position with this constraint's local state.
        rule: Candidate production rule.
        global_: Current per-derivation global state.

    Returns:
        ``True`` to keep the rule in the candidate set.
    """
    return True

update

update(slot: ExpansionContext[L], rule: Rule, global_: G) -> tuple[list[L], G]

Called after rule is applied at slot.

Returns per-child local states (one per non-terminal in rule.rhs, in order) and the new global state. May be used to update global counters by returning a new global_ value.

Parameters:

Name	Type	Description	Default
`slot`	`ExpansionContext[L]`	Derivation position immediately before the rule was applied.	required
`rule`	`Rule`	The rule that was applied.	required
`global_`	`G`	Global state before this application.	required

Returns:

Type	Description
`list[L]`	`(child_locals, new_global)` where `len(child_locals)` equals
`G`	the number of non-terminals in `rule.rhs`.

Source code in SRToolkit/utils/grammar/constraints.py

def update(self, slot: ExpansionContext[L], rule: Rule, global_: G) -> tuple[list[L], G]:
    """
    Called after ``rule`` is applied at ``slot``.

    Returns per-child local states (one per non-terminal in ``rule.rhs``,
    in order) and the new global state. May be used to update global
    counters by returning a new ``global_`` value.

    Args:
        slot: Derivation position immediately before the rule was applied.
        rule: The rule that was applied.
        global_: Global state before this application.

    Returns:
        ``(child_locals, new_global)`` where ``len(child_locals)`` equals
        the number of non-terminals in ``rule.rhs``.
    """
    n = sum(1 for s in rule.rhs if s in slot.nonterminals)
    return [slot.local] * n, global_

DimensionalConsistency

DimensionalConsistency(variable_units: dict[str, dict], target_unit: dict, constant_units: Optional[dict[str, dict]] = None, symbol_library: Optional[SymbolLibrary] = None, allow_unit_polymorphic_constants: bool = False)

Bases: Constraint[Optional[Unit], None]

Stateful constraint that enforces dimensional analysis during expression generation.

Local state at each slot is the required unit (Optional[Unit]). None means "any unit is acceptable here" — used for the children of multiplicative operators, whose individual units are underdetermined.

Unit representation: units are dict[str, Fraction] mapping base dimension names (e.g. "m", "s", "kg") to rational exponents. Dimensionless is {} (empty dict).

Free constants (const type): treated as dimensionless by default. Set allow_unit_polymorphic_constants=True to let them absorb any required unit.

Named physical constants (lit type, e.g. "g", "c"): declared via constant_units; checked like variables.

Undeclared variables or literals: conservatively accepted.

Rule classification: uses rule.name when available (see Grammar.from_symbol_library for the naming scheme). Falls back to rhs-shape heuristics for unnamed rules.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> from SRToolkit.utils.grammar import DimensionalConsistency
>>> from fractions import Fraction
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["E", "+", "F"], weight=0.6, name="E_add_+"))
>>> g.add_rule(Rule("E", ["F"], weight=0.4, name="E_to_F"))
>>> g.add_rule(Rule("F", ["v"], weight=0.5, name="F_v"))
>>> g.add_rule(Rule("F", ["t"], weight=0.5, name="F_t"))
>>> dc = DimensionalConsistency(
...     variable_units={"v": {"m": 1, "s": -1}, "t": {"s": 1}},
...     target_unit={"m": 1, "s": -1},
... )
>>> g.add_constraint(dc)
>>> d = g.start_derivation("E")
>>> all(r.rhs != ["t"] for r in d.options())
True

Parameters:

Name	Type	Description	Default
`variable_units`	`dict[str, dict]`	Mapping from variable token to its unit.	required
`target_unit`	`dict`	Required unit of the generated expression.	required
`constant_units`	`Optional[dict[str, dict]]`	Units for named physical constants (`lit` type).	`None`
`symbol_library`	`Optional[SymbolLibrary]`	Used to classify tokens by type and precedence.	`None`
`allow_unit_polymorphic_constants`	`bool`	If `True`, free constants absorb whatever unit their slot requires. Default `False`.	`False`

Source code in SRToolkit/utils/grammar/constraints.py

def __init__(
    self,
    variable_units: dict[str, dict],
    target_unit: dict,
    constant_units: Optional[dict[str, dict]] = None,
    symbol_library: Optional[SymbolLibrary] = None,
    allow_unit_polymorphic_constants: bool = False,
) -> None:
    self._var_units: dict[str, Unit] = {k: _to_unit(v) for k, v in variable_units.items()}
    self._target: Unit = _to_unit(target_unit)
    self._const_units: dict[str, Unit] = {k: _to_unit(v) for k, v in (constant_units or {}).items()}
    self._sl = symbol_library
    self._allow_poly_const = allow_unit_polymorphic_constants

ExpansionContext `dataclass`

ExpansionContext(nonterminal: str, local: L, steps: int, parent_rule: Optional[Rule], child_index: Optional[int], ancestors: tuple[AncestorInfo, ...], partial_tree: ParseTreeNode, nonterminals: frozenset[str], frontier_size: int = 1)

Bases: Generic[L]

Read-only view of the current derivation position passed to constraints.

The engine constructs an ExpansionContext on demand for each (constraint, candidate_rule) pair evaluated in Derivation.options and for each (constraint, selected_rule) pair in Derivation.apply.

Attributes:

Name	Type	Description
`nonterminal`	`str`	The non-terminal symbol being expanded at this position.
`local`	`L`	This constraint's per-slot inherited state.
`steps`	`int`	Number of rule applications made so far in the derivation (i.e. how many times `apply()` has been called).
`parent_rule`	`Optional[Rule]`	The rule whose application created this slot (`None` for the start symbol).
`child_index`	`Optional[int]`	Index of this slot in `parent_rule.rhs` counting only non-terminal positions (`None` for the start symbol).
`ancestors`	`tuple[AncestorInfo, ...]`	Ancestor chain from the root to this slot's immediate parent, in root-first order.
`partial_tree`	`ParseTreeNode`	Root of the partially-built parse tree. Read-only by contract — constraints must not mutate it.
`nonterminals`	`frozenset[str]`	Frozen set of all non-terminal symbols in the grammar.
`frontier_size`	`int`	Number of open (unexpanded) non-terminal slots in the current derivation frontier, including this slot. Useful for budget constraints such as MaxNodes that need to account for the minimum number of rule applications still required to complete the derivation.

MaxDepth

MaxDepth(limit: int)

Bases: Constraint[int, None]

Hard limit on derivation depth.

Local state is the remaining depth budget at each slot. A rule is rejected when its application would require at least one recursive non-terminal child but the budget has reached zero.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
...     Rule("E", ["E", "+", "E"]),
...     Rule("E", ["x"]),
... ])
>>> g.add_constraint(MaxDepth(0))
>>> d = g.start_derivation("E")
>>> all(r.rhs == ["x"] for r in d.options())
True

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum nesting depth (number of non-terminal expansion levels).	required

Source code in SRToolkit/utils/grammar/constraints.py

def __init__(self, limit: int) -> None:
    self.limit = limit

MaxNodes

MaxNodes(limit: int)

Bases: Constraint[None, int]

Hard limit on the total number of rule applications in a derivation.

Global state is a running count of applications so far. A rule is rejected when applying it would push the count over the limit taking into account the non-terminal children it introduces (each of which will require at least one more application).

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
...     Rule("E", ["E", "+", "E"]),
...     Rule("E", ["x"]),
... ])
>>> g.add_constraint(MaxNodes(3))
>>> d = g.start_derivation("E")
>>> # At node-count 3 only the terminal rule survives
>>> tokens = d.generate()
>>> len(tokens) <= 5
True

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum number of rule applications.	required

Source code in SRToolkit/utils/grammar/constraints.py

def __init__(self, limit: int) -> None:
    self.limit = limit

MaxOccurrences

MaxOccurrences(symbol: str, limit: int)

Bases: Constraint[None, int]

Hard limit on how many times a specific terminal symbol may appear.

Global state counts occurrences committed so far. A rule whose rhs contains the tracked symbol is rejected once the count equals the limit.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
...     Rule("E", ["E", "+", "E"]),
...     Rule("E", ["x"]),
...     Rule("E", ["y"]),
... ])
>>> g.add_constraint(MaxOccurrences("x", 1))
>>> d = g.start_derivation("E")
>>> tokens = d.generate(limit=200)
>>> tokens.count("x") <= 1
True

Parameters:

Name	Type	Description	Default
`symbol`	`str`	Terminal token to track.	required
`limit`	`int`	Maximum allowed occurrences.	required

Source code in SRToolkit/utils/grammar/constraints.py

def __init__(self, symbol: str, limit: int) -> None:
    self.symbol = symbol
    self.limit = limit

NoNested

NoNested(symbols: str | Iterable[str])

Bases: Constraint[bool, None]

Prevent any symbol in a group from appearing nested inside any other symbol in the same group.

Local state is a boolean "currently under any symbol in the group". Any rule whose rhs contains a group symbol is rejected while the local state is True. Children inherit True once such a rule is applied.

Pass a single string to forbid self-nesting only; pass multiple symbols to forbid cross-nesting within the group (e.g. NoNested(TRIG_FNS) prevents sin(cos(x)) as well as sin(sin(x))).

Warning

This constraint propagates the "inside" flag to all non-terminal children of any rule whose rhs contains a group symbol. It works correctly when each group symbol appears in a rule with exactly one non-terminal child (typical prefix-function rules such as E -> sin ( E )) and for infix operators where every child should be blocked (e.g. E -> E + E). It does not work correctly for mixed rules that combine a function call with additional non-terminal siblings in a single production (e.g. E -> sin ( E ) + F): the sibling F will be incorrectly treated as inside the group. If your grammar uses such rules, implement a custom Constraint instead.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
...     Rule("E", ["sin", "(", "E", ")"]),
...     Rule("E", ["cos", "(", "E", ")"]),
...     Rule("E", ["x"]),
... ])
>>> g.add_constraint(NoNested(["sin", "cos"]))
>>> d = g.start_derivation("E")
>>> d.apply(g.rules_for("E")[0])   # apply sin(E)
>>> # Now inside sin; both sin and cos rules should be gone
>>> opts = d.options()
>>> all("sin" not in r.rhs and "cos" not in r.rhs for r in opts)
True

Parameters:

Name	Type	Description	Default
`symbols`	`str \| Iterable[str]`	A single terminal token or an iterable of terminal tokens that form the nesting group.	required

Source code in SRToolkit/utils/grammar/constraints.py

def __init__(self, symbols: str | Iterable[str]) -> None:
    if isinstance(symbols, str):
        self.symbols: frozenset[str] = frozenset({symbols})
    else:
        self.symbols = frozenset(symbols)

Derivation

Derivation(grammar: Grammar, start: str)

Stateful leftmost derivation over a Grammar.

A derivation begins at a start non-terminal and proceeds by repeatedly choosing a rule to apply to the leftmost unexpanded non-terminal. options returns candidate rules filtered by all registered constraints; apply advances the derivation by one production.

Per-slot local state and per-derivation global state for each registered constraint are maintained internally; constraint instances carry only construction-time configuration and are safe to share across parallel derivations.

Obtain a Derivation via Grammar.start_derivation.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.complete
False
>>> opts = d.options()
>>> len(opts)
1
>>> d.apply(opts[0])
>>> d.complete
True
>>> d.to_token_list()
['x']

Source code in SRToolkit/utils/grammar/derivation.py

def __init__(self, grammar: Grammar, start: str) -> None:
    self._grammar = grammar
    self._nonterminals: frozenset[str] = frozenset(grammar.nonterminals)
    self._steps: int = 0
    self._root: ParseTreeNode = ParseTreeNode(start, None)

    initial_frame = _Frame(
        nonterminal=start,
        node=self._root,
        ancestors=(),
        parent_rule=None,
        child_index=None,
    )
    self._globals: dict[int, Any] = {}
    for c in grammar._constraints:
        cid = id(c)
        initial_frame.locals[cid] = c.initial_local(start)
        self._globals[cid] = c.initial_global()
    # _frames[-1] is the current leftmost open slot.
    self._frames: list[_Frame] = [initial_frame]

complete `property`

complete: bool

True when no unexpanded non-terminals remain.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.complete
False
>>> d.apply(d.options()[0])
>>> d.complete
True

local_stack

local_stack(constraint: Constraint) -> list[Any]

Return the local state stack for constraint across the open frontier, leftmost slot first.

Parameters:

Name	Type	Description	Default
`constraint`	`Constraint`	A constraint previously registered on the grammar.	required

Returns:

Type	Description
`list[Any]`	One entry per open non-terminal in left-to-right order.

Source code in SRToolkit/utils/grammar/derivation.py

def local_stack(self, constraint: Constraint) -> list[Any]:
    """
    Return the local state stack for ``constraint`` across the open frontier,
    leftmost slot first.

    Args:
        constraint: A constraint previously registered on the grammar.

    Returns:
        One entry per open non-terminal in left-to-right order.
    """
    cid = id(constraint)
    return [f.locals[cid] for f in reversed(self._frames)]

global_state

global_state(constraint: Constraint) -> Any

Return the per-derivation global state for constraint.

Parameters:

Name	Type	Description	Default
`constraint`	`Constraint`	A constraint previously registered on the grammar.	required

Returns:

Type	Description
`Any`	The current global value owned by this derivation for `constraint`.

Source code in SRToolkit/utils/grammar/derivation.py

def global_state(self, constraint: Constraint) -> Any:
    """
    Return the per-derivation global state for ``constraint``.

    Args:
        constraint: A constraint previously registered on the grammar.

    Returns:
        The current global value owned by this derivation for ``constraint``.
    """
    return self._globals[id(constraint)]

options

options() -> list[Rule]

Return candidate rules for the current leftmost unexpanded non-terminal, filtered by every registered constraint's allows.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> g.add_rule(Rule("E", ["y"]))
>>> d = g.start_derivation("E")
>>> len(d.options())
2

Returns:

Type	Description
`list[Rule]`	List of Rule objects that every
`list[Rule]`	constraint accepts.

Raises:

Type	Description
`RuntimeError`	If the derivation is already complete.

Source code in SRToolkit/utils/grammar/derivation.py

def options(self) -> list[Rule]:
    """
    Return candidate rules for the current leftmost unexpanded
    non-terminal, filtered by every registered constraint's
    [allows][SRToolkit.utils.grammar.constraints.Constraint.allows].

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.add_rule(Rule("E", ["y"]))
        >>> d = g.start_derivation("E")
        >>> len(d.options())
        2

    Returns:
        List of [Rule][SRToolkit.utils.grammar.Rule] objects that every
        constraint accepts.

    Raises:
        RuntimeError: If the derivation is already complete.
    """
    if self.complete:
        raise RuntimeError("Derivation is already complete; no options remain.")

    top = self._frames[-1]
    candidates = self._grammar.rules_for(top.nonterminal)

    for c in self._grammar._constraints:
        cid = id(c)
        slot = self._slot_for(top, top.locals[cid])
        global_ = self._globals[cid]
        surviving: list[Rule] = []
        for rule in candidates:
            if _scope_miss(c, top.nonterminal, rule):
                surviving.append(rule)
                continue
            if c.allows(slot, rule, global_):
                surviving.append(rule)
        candidates = surviving
        if not candidates:
            break

    return candidates

apply

apply(rule: Rule) -> None

Apply rule to the current leftmost unexpanded non-terminal.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["E", "+", "F"]))
>>> g.add_rule(Rule("E", ["x"]))
>>> g.add_rule(Rule("F", ["y"]))
>>> d = g.start_derivation("E")
>>> d.apply(g.rules_for("E")[0])   # E -> E + F
>>> d.apply(g.rules_for("E")[1])   # E -> x
>>> d.apply(g.rules_for("F")[0])   # F -> y
>>> d.to_token_list()
['x', '+', 'y']

Parameters:

Name	Type	Description	Default
`rule`	`Rule`	A rule whose `lhs` matches the current leftmost non-terminal.	required

Raises:

Type	Description
`RuntimeError`	If the derivation is already complete.
`ValueError`	If `rule.lhs` does not match the current non-terminal.

Source code in SRToolkit/utils/grammar/derivation.py

def apply(self, rule: Rule) -> None:
    """
    Apply ``rule`` to the current leftmost unexpanded non-terminal.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["E", "+", "F"]))
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.add_rule(Rule("F", ["y"]))
        >>> d = g.start_derivation("E")
        >>> d.apply(g.rules_for("E")[0])   # E -> E + F
        >>> d.apply(g.rules_for("E")[1])   # E -> x
        >>> d.apply(g.rules_for("F")[0])   # F -> y
        >>> d.to_token_list()
        ['x', '+', 'y']

    Args:
        rule: A rule whose ``lhs`` matches the current leftmost
            non-terminal.

    Raises:
        RuntimeError: If the derivation is already complete.
        ValueError: If ``rule.lhs`` does not match the current
            non-terminal.
    """
    if self.complete:
        raise RuntimeError("Derivation is already complete.")

    top = self._frames[-1]
    if rule.lhs != top.nonterminal:
        raise ValueError(f"Rule lhs '{rule.lhs}' does not match current non-terminal '{top.nonterminal}'.")

    # Build child parse-tree nodes and the new frames for each NT child.
    child_frames: list[_Frame] = []
    nt_child_index = 0
    for sym in rule.rhs:
        child_node = ParseTreeNode(sym, None)
        top.node.children.append(child_node)
        if sym in self._nonterminals:
            frame = AncestorInfo(top.nonterminal, rule, nt_child_index)
            child_frames.append(
                _Frame(
                    nonterminal=sym,
                    node=child_node,
                    ancestors=top.ancestors + (frame,),
                    parent_rule=rule,
                    child_index=nt_child_index,
                )
            )
            nt_child_index += 1

    top.node.rule_applied = rule
    n_nt_children = len(child_frames)

    # Thread state through every constraint using the pre-apply slot view.
    pre_frontier_size = len(self._frames)
    for c in self._grammar._constraints:
        cid = id(c)
        slot = self._slot_for(top, top.locals[cid], frontier_size=pre_frontier_size)
        child_locals, new_global = c.update(slot, rule, self._globals[cid])
        if len(child_locals) != n_nt_children:
            raise RuntimeError(
                f"Constraint {c!r}.update() returned {len(child_locals)} child locals "
                f"but rule '{rule.lhs} -> {rule.rhs}' has {n_nt_children} "
                f"non-terminal children."
            )
        for i, cf in enumerate(child_frames):
            cf.locals[cid] = child_locals[i]
        self._globals[cid] = new_global

    # Pop the top frame and push children so that the leftmost child ends
    # up at the end of the list (i.e. on top of the stack).
    self._frames.pop()
    for cf in reversed(child_frames):
        self._frames.append(cf)
    self._steps += 1

sample

sample() -> None

Apply one rule chosen proportionally by weight (PCFG) or uniformly (CFG) from the surviving candidates.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.sample()
>>> d.complete
True

Raises:

Type	Description
`RuntimeError`	If the derivation is already complete.
`RuntimeError`	If all candidate rules are filtered out by allows.

Source code in SRToolkit/utils/grammar/derivation.py

def sample(self) -> None:
    """
    Apply one rule chosen proportionally by weight (PCFG) or uniformly
    (CFG) from the surviving candidates.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> d = g.start_derivation("E")
        >>> d.sample()
        >>> d.complete
        True

    Raises:
        RuntimeError: If the derivation is already complete.
        RuntimeError: If all candidate rules are filtered out by
            [allows][SRToolkit.utils.grammar.constraints.Constraint.allows].
    """
    candidates = self.options()
    if not candidates:
        raise RuntimeError(
            "No valid rules available for the current non-terminal after applying constraint filters."
        )

    weights = [float(r.weight) for r in candidates]
    total = sum(weights)
    probs = [w / total for w in weights]
    self.apply(candidates[np.random.choice(len(candidates), p=probs)])

generate

generate(limit: int = 1000) -> list[str]

Run the derivation to completion and return the token list.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> g.start_derivation("E").generate()
['x']

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum number of rule applications. A negative value means unlimited. Default `1000`.	`1000`

Returns:

Type	Description
`list[str]`	Flat list of terminal tokens in left-to-right order.

Raises:

Type	Description
`RuntimeError`	If the derivation does not complete within `limit` steps (only when `limit >= 0`).

Source code in SRToolkit/utils/grammar/derivation.py

def generate(self, limit: int = 1000) -> list[str]:
    """
    Run the derivation to completion and return the token list.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.start_derivation("E").generate()
        ['x']

    Args:
        limit: Maximum number of rule applications. A negative value means
            unlimited. Default ``1000``.

    Returns:
        Flat list of terminal tokens in left-to-right order.

    Raises:
        RuntimeError: If the derivation does not complete within
            ``limit`` steps (only when ``limit >= 0``).
    """
    steps = 0
    while not self.complete:
        if steps >= limit >= 0:
            raise RuntimeError(f"Derivation did not complete within {limit} rule applications.")
        self.sample()
        steps += 1
    return self.to_token_list()

to_token_list

to_token_list() -> list[str]

Return the completed expression as a flat token list.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.apply(d.options()[0])
>>> d.to_token_list()
['x']

Returns:

Type	Description
`list[str]`	Flat list of terminal tokens in left-to-right order.

Raises:

Type	Description
`RuntimeError`	If the derivation is not yet complete.

Source code in SRToolkit/utils/grammar/derivation.py

def to_token_list(self) -> list[str]:
    """
    Return the completed expression as a flat token list.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> d = g.start_derivation("E")
        >>> d.apply(d.options()[0])
        >>> d.to_token_list()
        ['x']

    Returns:
        Flat list of terminal tokens in left-to-right order.

    Raises:
        RuntimeError: If the derivation is not yet complete.
    """
    if not self.complete:
        raise RuntimeError("Derivation is not yet complete.")
    return self.to_parse_tree().to_token_list()

to_parse_tree

to_parse_tree() -> ParseTree

Return the completed derivation as a ParseTree.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.apply(d.options()[0])
>>> isinstance(d.to_parse_tree(), ParseTree)
True

Returns:

Type	Description
`ParseTree`	The ParseTree rooted at the
`ParseTree`	start symbol.

Raises:

Type	Description
`RuntimeError`	If the derivation is not yet complete.

Source code in SRToolkit/utils/grammar/derivation.py

def to_parse_tree(self) -> ParseTree:
    """
    Return the completed derivation as a
    [ParseTree][SRToolkit.utils.grammar.ParseTree].

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> d = g.start_derivation("E")
        >>> d.apply(d.options()[0])
        >>> isinstance(d.to_parse_tree(), ParseTree)
        True

    Returns:
        The [ParseTree][SRToolkit.utils.grammar.ParseTree] rooted at the
        start symbol.

    Raises:
        RuntimeError: If the derivation is not yet complete.
    """
    if not self.complete:
        raise RuntimeError("Derivation is not yet complete.")
    return ParseTree(self._root)

Grammar

Grammar(rules: Optional[list[Rule]] = None, start: Optional[str] = None)

A context-free grammar (CFG) or probabilistic context-free grammar (PCFG).

Rules are added via add_rule. The set of non-terminals is derived automatically: a symbol is a non-terminal if and only if it appears as the lhs of at least one rule.

A grammar is a CFG when every rule carries the default weight (1.0), making sampling uniform within each group. It is a PCFG when any rule has a weight that differs from 1.0.

Constraints are registered via add_constraint. During a derivation, options returns only rules that every constraint's allows accepts.

Examples:

>>> g = Grammar([
...     Rule("E", ["E", "+", "F"], weight=0.4),
...     Rule("E", ["F"], weight=0.6),
...     Rule("F", ["x"]),
... ])
>>> "E" in g.nonterminals
True
>>> g.is_pcfg()
True
>>> len(g.rules_for("E"))
2

Parameters:

Name	Type	Description	Default
`rules`	`Optional[list[Rule]]`	Optional list of Rule objects to add at construction time. Equivalent to calling add_rule for each entry.	`None`
`start`	`Optional[str]`	Default start non-terminal used by start_derivation when no `start` argument is given.	`None`

Source code in SRToolkit/utils/grammar/grammar.py

def __init__(self, rules: Optional[list[Rule]] = None, start: Optional[str] = None) -> None:
    """
    Args:
        rules: Optional list of [Rule][SRToolkit.utils.grammar.Rule] objects to add at
            construction time. Equivalent to calling
            [add_rule][SRToolkit.utils.grammar.Grammar.add_rule] for each entry.
        start: Default start non-terminal used by
            [start_derivation][SRToolkit.utils.grammar.Grammar.start_derivation]
            when no ``start`` argument is given.
    """
    self.start: Optional[str] = start
    self._rules: list[Rule] = []
    self._rules_by_lhs: dict[str, list[Rule]] = {}
    self._constraints: list = []
    for rule in rules or []:
        self.add_rule(rule)

nonterminals `property`

nonterminals: set[str]

Set of all non-terminal symbols in the grammar.

A symbol is a non-terminal if and only if it appears as the lhs of at least one rule.

Examples:

>>> g = Grammar()
>>> g.add_rule(Rule("E", ["F"]))
>>> g.add_rule(Rule("F", ["x"]))
>>> g.nonterminals == {"E", "F"}
True

add_rule

add_rule(rule: Rule) -> None

Add a production rule to the grammar.

Examples:

>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> len(g.rules_for("E"))
1

Parameters:

Name	Type	Description	Default
`rule`	`Rule`	The Rule to register.	required

Source code in SRToolkit/utils/grammar/grammar.py

def add_rule(self, rule: Rule) -> None:
    """
    Add a production rule to the grammar.

    Examples:
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> len(g.rules_for("E"))
        1

    Args:
        rule: The [Rule][SRToolkit.utils.grammar.Rule] to register.
    """
    self._rules.append(rule)
    self._rules_by_lhs.setdefault(rule.lhs, []).append(rule)

add_constraint

add_constraint(constraint: 'Constraint') -> None

Register a constraint applied at each derivation step.

A rule is offered as an option only when every registered constraint's allows returns True for it.

Examples:

>>> from SRToolkit.utils.grammar import MaxDepth
>>> g = Grammar([Rule("E", ["E", "+", "E"]), Rule("E", ["x"])])
>>> g.add_constraint(MaxDepth(0))
>>> d = g.start_derivation("E")
>>> [r.rhs for r in d.options()]
[['x']]

Parameters:

Name	Type	Description	Default
`constraint`	`'Constraint'`	A Constraint instance — typically a built-in such as MaxDepth, MaxNodes, MaxOccurrences, NoNested, or DimensionalConsistency, or a user-defined subclass.	required

Source code in SRToolkit/utils/grammar/grammar.py

def add_constraint(self, constraint: "Constraint") -> None:
    """
    Register a [constraint][SRToolkit.utils.grammar.constraints.Constraint] applied at each derivation step.

    A rule is offered as an option only when every registered constraint's
    [allows][SRToolkit.utils.grammar.constraints.Constraint.allows] returns ``True``
    for it.

    Examples:
        >>> from SRToolkit.utils.grammar import MaxDepth
        >>> g = Grammar([Rule("E", ["E", "+", "E"]), Rule("E", ["x"])])
        >>> g.add_constraint(MaxDepth(0))
        >>> d = g.start_derivation("E")
        >>> [r.rhs for r in d.options()]
        [['x']]

    Args:
        constraint: A [Constraint][SRToolkit.utils.grammar.constraints.Constraint]
            instance — typically a built-in such as
            [MaxDepth][SRToolkit.utils.grammar.constraints.MaxDepth],
            [MaxNodes][SRToolkit.utils.grammar.constraints.MaxNodes],
            [MaxOccurrences][SRToolkit.utils.grammar.constraints.MaxOccurrences],
            [NoNested][SRToolkit.utils.grammar.constraints.NoNested], or
            [DimensionalConsistency][SRToolkit.utils.grammar.constraints.DimensionalConsistency],
            or a user-defined subclass.
    """
    self._constraints.append(constraint)

rules_for

rules_for(nonterminal: str) -> list[Rule]

Return all rules whose lhs matches nonterminal.

Examples:

>>> g = Grammar()
>>> g.add_rule(Rule("E", ["F"]))
>>> g.add_rule(Rule("E", ["x"]))
>>> len(g.rules_for("E"))
2
>>> g.rules_for("Z")
[]

Parameters:

Name	Type	Description	Default
`nonterminal`	`str`	Non-terminal to look up.	required

Returns:

Type	Description
`list[Rule]`	List of matching Rule objects in insertion order.

Source code in SRToolkit/utils/grammar/grammar.py

def rules_for(self, nonterminal: str) -> list[Rule]:
    """
    Return all rules whose ``lhs`` matches ``nonterminal``.

    Examples:
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["F"]))
        >>> g.add_rule(Rule("E", ["x"]))
        >>> len(g.rules_for("E"))
        2
        >>> g.rules_for("Z")
        []

    Args:
        nonterminal: Non-terminal to look up.

    Returns:
        List of matching [Rule][SRToolkit.utils.grammar.Rule] objects in insertion order.
    """
    return list(self._rules_by_lhs.get(nonterminal, []))

is_pcfg

is_pcfg() -> bool

Return True if any rule deviates from the default weight of 1.0.

Examples:

>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> g.add_rule(Rule("E", ["y"]))
>>> g.is_pcfg()
False
>>> g.add_rule(Rule("F", ["z"], weight=0.3))
>>> g.is_pcfg()
True

Returns:

Type	Description
`bool`	`True` if at least one rule has a weight other than `1.0`.

Source code in SRToolkit/utils/grammar/grammar.py

def is_pcfg(self) -> bool:
    """
    Return ``True`` if any rule deviates from the default weight of ``1.0``.

    Examples:
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.add_rule(Rule("E", ["y"]))
        >>> g.is_pcfg()
        False
        >>> g.add_rule(Rule("F", ["z"], weight=0.3))
        >>> g.is_pcfg()
        True

    Returns:
        ``True`` if at least one rule has a weight other than ``1.0``.
    """
    return any(r.weight != 1.0 for r in self._rules)

validate

validate(parse_tree: ParseTree, require_start: bool = False) -> bool

Validate that parse_tree is structurally valid, uses only productions from this grammar, and satisfies every registered constraint.

By default the root of the tree may be any non-terminal in the grammar, not just self.start. This lets you validate sub-trees (e.g. an F-rooted fragment) in addition to full expressions. Pass require_start=True to additionally enforce that the root symbol equals self.start (useful when validating complete, top-level expressions).

The check replays the parse tree through a fresh Derivation rooted at parse_tree.root.symbol, walking nodes in leftmost order (mirroring the derivation frontier). At each internal node it checks that:

The applied rule is permitted at the current frontier position — meaning it exists in the grammar and every registered constraint accepts it.
The number of children equals len(rule.rhs).
Each children[i].symbol equals rule.rhs[i].

Terminal leaves are accepted when their symbol is not a non-terminal in this grammar. A non-terminal symbol appearing as a leaf (unexpanded) is rejected.

Examples:

>>> g = Grammar(start="E")
>>> r = Rule("E", ["x"])
>>> g.add_rule(r)
>>> leaf = ParseTreeNode("x", None)
>>> root = ParseTreeNode("E", r, [leaf])
>>> g.validate(ParseTree(root))
True
>>> foreign = Rule("E", ["y"])
>>> root2 = ParseTreeNode("E", foreign, [ParseTreeNode("y", None)])
>>> g.validate(ParseTree(root2))
False
>>> g.add_rule(Rule("F", ["x"]))
>>> r_f = Rule("F", ["x"])
>>> g.add_rule(r_f)
>>> f_root = ParseTreeNode("F", r_f, [ParseTreeNode("x", None)])
>>> g.validate(ParseTree(f_root))
True
>>> g.validate(ParseTree(f_root), require_start=True)
False

Parameters:

Name	Type	Description	Default
`parse_tree`	`ParseTree`	The ParseTree to validate.	required
`require_start`	`bool`	When `True`, return `False` if the root symbol does not equal `self.start`. Raises `ValueError` if `self.start` is `None` and `require_start` is `True`.	`False`

Returns:

Type	Description
`bool`	`True` if the tree is structurally consistent, every production exists
`bool`	in this grammar, and all constraints would have permitted the derivation
`bool`	(and the root matches `self.start` when `require_start=True`).

Raises:

Type	Description
`ValueError`	If `require_start=True` but `self.start` is `None`.

Source code in SRToolkit/utils/grammar/grammar.py

def validate(self, parse_tree: ParseTree, require_start: bool = False) -> bool:
    """
    Validate that ``parse_tree`` is structurally valid, uses only productions
    from this grammar, and satisfies every registered constraint.

    By default the root of the tree may be **any non-terminal** in the grammar,
    not just ``self.start``.  This lets you validate sub-trees (e.g. an
    ``F``-rooted fragment) in addition to full expressions.  Pass
    ``require_start=True`` to additionally enforce that the root symbol equals
    ``self.start`` (useful when validating complete, top-level expressions).

    The check replays the parse tree through a fresh
    [Derivation][SRToolkit.utils.grammar.derivation.Derivation] rooted at
    ``parse_tree.root.symbol``, walking nodes in leftmost order (mirroring the
    derivation frontier).  At each internal node it checks that:

    - The applied rule is permitted at the current frontier position — meaning
      it exists in the grammar **and** every registered constraint accepts it.
    - The number of children equals ``len(rule.rhs)``.
    - Each ``children[i].symbol`` equals ``rule.rhs[i]``.

    Terminal leaves are accepted when their symbol is not a non-terminal in
    this grammar.  A non-terminal symbol appearing as a leaf (unexpanded) is
    rejected.

    Examples:
        >>> g = Grammar(start="E")
        >>> r = Rule("E", ["x"])
        >>> g.add_rule(r)
        >>> leaf = ParseTreeNode("x", None)
        >>> root = ParseTreeNode("E", r, [leaf])
        >>> g.validate(ParseTree(root))
        True
        >>> foreign = Rule("E", ["y"])
        >>> root2 = ParseTreeNode("E", foreign, [ParseTreeNode("y", None)])
        >>> g.validate(ParseTree(root2))
        False
        >>> g.add_rule(Rule("F", ["x"]))
        >>> r_f = Rule("F", ["x"])
        >>> g.add_rule(r_f)
        >>> f_root = ParseTreeNode("F", r_f, [ParseTreeNode("x", None)])
        >>> g.validate(ParseTree(f_root))
        True
        >>> g.validate(ParseTree(f_root), require_start=True)
        False

    Args:
        parse_tree: The [ParseTree][SRToolkit.utils.grammar.ParseTree] to validate.
        require_start: When ``True``, return ``False`` if the root symbol does
            not equal ``self.start``.  Raises ``ValueError`` if ``self.start``
            is ``None`` and ``require_start`` is ``True``.

    Returns:
        ``True`` if the tree is structurally consistent, every production exists
        in this grammar, and all constraints would have permitted the derivation
        (and the root matches ``self.start`` when ``require_start=True``).

    Raises:
        ValueError: If ``require_start=True`` but ``self.start`` is ``None``.
    """
    if require_start:
        if self.start is None:
            raise ValueError("require_start=True but this grammar has no start symbol.")
        if parse_tree.root.symbol != self.start:
            return False
    root = parse_tree.root
    if root.rule_applied is None:
        return len(root.children) == 0 and root.symbol not in self.nonterminals
    try:
        d = self.start_derivation(root.symbol)
    except ValueError:
        return False
    stack = [root]
    while stack:
        node = stack.pop()
        if node.rule_applied is None:
            if node.symbol in self.nonterminals:
                return False
            continue
        if d.complete:
            return False
        if node.rule_applied not in d.options():
            return False
        if len(node.children) != len(node.rule_applied.rhs):
            return False
        for child, expected in zip(node.children, node.rule_applied.rhs):
            if child.symbol != expected:
                return False
        d.apply(node.rule_applied)
        for child in reversed(node.children):
            if child.symbol in self.nonterminals:
                stack.append(child)
    return d.complete

start_derivation

start_derivation(start: Optional[str] = None) -> 'Derivation'

Begin a new derivation.

Examples:

>>> g = Grammar(start="E")
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation()
>>> d.complete
False

Parameters:

Name	Type	Description	Default
`start`	`Optional[str]`	Start non-terminal. Defaults to `self.start` when `None`.	`None`

Returns:

Type	Description
`'Derivation'`	A Derivation at the first expansion step.

Raises:

Type	Description
`ValueError`	If the resolved start symbol is not a non-terminal in this grammar.

Source code in SRToolkit/utils/grammar/grammar.py

def start_derivation(self, start: Optional[str] = None) -> "Derivation":
    """
    Begin a new derivation.

    Examples:
        >>> g = Grammar(start="E")
        >>> g.add_rule(Rule("E", ["x"]))
        >>> d = g.start_derivation()
        >>> d.complete
        False

    Args:
        start: Start non-terminal.  Defaults to
            ``self.start`` when
            ``None``.

    Returns:
        A [Derivation][SRToolkit.utils.grammar.derivation.Derivation] at the first expansion step.

    Raises:
        ValueError: If the resolved start symbol is not a non-terminal in
            this grammar.
    """
    from .derivation import Derivation

    s = self.start if start is None else start

    if s is None:
        raise ValueError(
            "No start symbol. Either add it thorough the constructor or as parameter "
            "in the Grammar.start_derivation method."
        )
    if s not in self.nonterminals:
        raise ValueError(f"Start symbol '{s}' is not a non-terminal in this grammar.")

    return Derivation(self, s)

generate_one

generate_one(start: Optional[str] = None, max_steps: int = 1000, max_retries: int = 10) -> Optional[list[str]]

Generate a single expression by sampling the grammar to completion.

Convenience wrapper around start_derivation and Derivation.generate.

Each attempt runs a fresh derivation from scratch. When a derivation exceeds max_steps rule applications (e.g. because random sampling kept choosing recursive rules), the attempt is discarded and a new one starts. None is returned only when every attempt fails, signalling that the caller should either relax the grammar, add more liberal constraints, or increase max_steps / max_retries.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar(start="E")
>>> g.add_rule(Rule("E", ["x"]))
>>> g.generate_one()
['x']
>>> g.generate_one(max_steps=0, max_retries=1) is None
True

Parameters:

Name	Type	Description	Default
`start`	`Optional[str]`	Start non-terminal. Defaults to `self.start` when `None`.	`None`
`max_steps`	`int`	Maximum rule applications per attempt. A negative number means unlimited (no retry logic applies).	`1000`
`max_retries`	`int`	Maximum number of fresh attempts before returning `None`. Must be at least `1`.	`10`

Returns:

Type	Description
`Optional[list[str]]`	A list of terminal tokens in left-to-right order, or `None` if
`Optional[list[str]]`	every attempt exceeded `max_steps`.

Raises:

Type	Description
`ValueError`	If the resolved start symbol is not a non-terminal in this grammar.

Source code in SRToolkit/utils/grammar/grammar.py

def generate_one(
    self,
    start: Optional[str] = None,
    max_steps: int = 1000,
    max_retries: int = 10,
) -> Optional[list[str]]:
    """
    Generate a single expression by sampling the grammar to completion.

    Convenience wrapper around
    [start_derivation][SRToolkit.utils.grammar.grammar.Grammar.start_derivation]
    and [Derivation.generate][SRToolkit.utils.grammar.derivation.Derivation.generate].

    Each attempt runs a fresh derivation from scratch.  When a derivation
    exceeds ``max_steps`` rule applications (e.g. because random sampling
    kept choosing recursive rules), the attempt is discarded and a new one
    starts.  ``None`` is returned only when every attempt fails, signalling
    that the caller should either relax the grammar, add more liberal
    constraints, or increase ``max_steps`` / ``max_retries``.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar(start="E")
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.generate_one()
        ['x']
        >>> g.generate_one(max_steps=0, max_retries=1) is None
        True

    Args:
        start: Start non-terminal. Defaults to ``self.start`` when ``None``.
        max_steps: Maximum rule applications per attempt. A negative number
            means unlimited (no retry logic applies).
        max_retries: Maximum number of fresh attempts before returning
            ``None``. Must be at least ``1``.

    Returns:
        A list of terminal tokens in left-to-right order, or ``None`` if
        every attempt exceeded ``max_steps``.

    Raises:
        ValueError: If the resolved start symbol is not a non-terminal in
            this grammar.
    """
    for _ in range(max(1, max_retries)):
        try:
            return self.start_derivation(start).generate(limit=max_steps)
        except RuntimeError:
            continue
    return None

to_dict

to_dict() -> dict

Serialise this grammar to a JSON-safe dictionary.

Constraints are serialised via their own to_dict method. User-defined constraint subclasses must implement to_dict/from_dict to survive the round-trip; built-in constraints are fully supported.

Returns:

Type	Description
`dict`	Dictionary with keys `start`, `rules`, and `constraints`.

Source code in SRToolkit/utils/grammar/grammar.py

def to_dict(self) -> dict:
    """
    Serialise this grammar to a JSON-safe dictionary.

    Constraints are serialised via their own ``to_dict`` method.  User-defined
    constraint subclasses must implement ``to_dict``/``from_dict`` to survive the
    round-trip; built-in constraints are fully supported.

    Returns:
        Dictionary with keys ``start``, ``rules``, and ``constraints``.
    """
    return {
        "start": self.start,
        "rules": [r.to_dict() for r in self._rules],
        "constraints": [c.to_dict() for c in self._constraints],
    }

from_dict `classmethod`

from_dict(d: dict) -> 'Grammar'

Reconstruct a Grammar from a dictionary produced by to_dict.

Parameters:

Name	Type	Description	Default
`d`	`dict`	Dictionary with keys `start`, `rules`, and `constraints`.	required

Returns:

Type	Description
`'Grammar'`	A new Grammar with all rules and
`'Grammar'`	constraints registered.

Source code in SRToolkit/utils/grammar/grammar.py

@classmethod
def from_dict(cls, d: dict) -> "Grammar":
    """
    Reconstruct a [Grammar][SRToolkit.utils.grammar.Grammar] from a dictionary
    produced by [to_dict][SRToolkit.utils.grammar.Grammar.to_dict].

    Args:
        d: Dictionary with keys ``start``, ``rules``, and ``constraints``.

    Returns:
        A new [Grammar][SRToolkit.utils.grammar.Grammar] with all rules and
        constraints registered.
    """
    from .constraints import Constraint

    rules = [Rule.from_dict(r) for r in d.get("rules", [])]
    g = cls(rules, start=d.get("start"))
    for cd in d.get("constraints", []):
        g.add_constraint(Constraint.from_dict(cd))
    return g

from_symbol_library `classmethod`

from_symbol_library(symbol_library: Optional[SymbolLibrary] = None, start: Optional[str] = None) -> 'Grammar'

Build a PCFG from a SymbolLibrary using a generic operator-precedence non-terminal hierarchy.

One non-terminal is created per unique OP precedence level present in the library. Known precedences map to readable names; custom precedences fall back to L_{p}:

OP_ADDITIVE → E
OP_MULTIPLICATIVE → F
OP_POWER → B
custom precedence p → L_{p}

The chain runs from the lowest-precedence level (the grammar start) down to T, K, R, and V. Only levels that have at least one operator are generated; there are no empty pass-through non-terminals.

Heuristic weights are calibrated against empirical expression distributions (e.g. Wikipedia mathematical formulae): E recursion ~30 %, F recursion ~40 %, B recursion ~5 %, custom-level recursion ~40 %.

Examples:

>>> sl = SymbolLibrary.from_symbol_list(["+", "-", "*", "sin", "^2"], 2)
>>> g = Grammar.from_symbol_library(sl)
>>> "E" in g.nonterminals and "V" in g.nonterminals
True
>>> g.is_pcfg()
True

Parameters:

Name	Type	Description	Default
`symbol_library`	`Optional[SymbolLibrary]`	Token vocabulary. Falls back to the active library from the context manager when `None`.	`None`
`start`	`Optional[str]`	Override the name of the lowest-precedence non-terminal (and `g.start`). When `None` (default) the name is auto-inferred from the precedence constants (`"E"` for standard libraries).	`None`

Returns:

Type	Description
`'Grammar'`	A Grammar with heuristic PCFG weights.

Raises:

Type	Description
`ValueError`	If the symbol library contains no variables, constants, or literals, making it impossible to generate any terminal expression.

Source code in SRToolkit/utils/grammar/grammar.py

@classmethod
def from_symbol_library(
    cls,
    symbol_library: Optional[SymbolLibrary] = None,
    start: Optional[str] = None,
) -> "Grammar":
    """
    Build a PCFG from a [SymbolLibrary][SRToolkit.utils.symbol_library.SymbolLibrary]
    using a generic operator-precedence non-terminal hierarchy.

    One non-terminal is created per unique ``OP`` precedence level present in
    the library.  Known precedences map to readable names; custom precedences
    fall back to ``L_{p}``:

    - ``OP_ADDITIVE`` → ``E``
    - ``OP_MULTIPLICATIVE`` → ``F``
    - ``OP_POWER`` → ``B``
    - custom precedence *p* → ``L_{p}``

    The chain runs from the lowest-precedence level (the grammar start) down
    to ``T``, ``K``, ``R``, and ``V``.  Only levels that have at least one
    operator are generated; there are no empty pass-through non-terminals.

    Heuristic weights are calibrated against empirical expression distributions
    (e.g. Wikipedia mathematical formulae): E recursion ~30 %, F recursion ~40 %,
    B recursion ~5 %, custom-level recursion ~40 %.

    Examples:
        >>> sl = SymbolLibrary.from_symbol_list(["+", "-", "*", "sin", "^2"], 2)
        >>> g = Grammar.from_symbol_library(sl)
        >>> "E" in g.nonterminals and "V" in g.nonterminals
        True
        >>> g.is_pcfg()
        True

    Args:
        symbol_library: Token vocabulary.  Falls back to the active library
            from the context manager when ``None``.
        start: Override the name of the lowest-precedence non-terminal (and
            ``g.start``).  When ``None`` (default) the name is auto-inferred
            from the precedence constants (``"E"`` for standard libraries).

    Returns:
        A [Grammar][SRToolkit.utils.grammar.Grammar] with heuristic PCFG weights.

    Raises:
        ValueError: If the symbol library contains no variables, constants, or
            literals, making it impossible to generate any terminal expression.
    """
    if symbol_library is None:
        symbol_library = SymbolLibrary.get_active()

    symbols = list(symbol_library.symbols.values())

    # Group operators by precedence level
    op_groups: dict[int, list[str]] = {}
    for s in symbols:
        if s["type"] == OP:
            op_groups.setdefault(s["precedence"], []).append(s["symbol"])
    sorted_precs = sorted(op_groups)

    R_fns = [s["symbol"] for s in symbols if s["type"] == FN and s["precedence"] == FN_PREFIX]
    P_fns = [s["symbol"] for s in symbols if s["type"] == FN and s["precedence"] == FN_POSTFIX]
    variables = [s["symbol"] for s in symbols if s["type"] == VAR]
    consts = [s["symbol"] for s in symbols if s["type"] == CONST]
    lits = [s["symbol"] for s in symbols if s["type"] == LIT]

    if not variables and not consts and not lits:
        raise ValueError(
            "Symbol library has no variables, constants, or literals; "
            "the grammar cannot generate any terminal expression."
        )

    # Collect terminal names so NT names can be made unique against them.
    all_terminals: set[str] = {sym for ops_list in op_groups.values() for sym in ops_list}
    all_terminals.update(R_fns + P_fns + variables + consts + lits)

    # If the caller explicitly chose a start name that is also a terminal symbol,
    # that is an unresolvable conflict — raise immediately.
    if start is not None and start in all_terminals:
        raise ValueError(
            f"start={start!r} is also a terminal symbol in the SymbolLibrary; "
            f"choose a different start symbol (e.g. {start + '_'!r})."
        )

    def _unique_nt(base: str, taken: set[str]) -> str:
        name = base
        while name in taken:
            name += "_"
        return name

    level_nts: dict[int, str] = {}
    for prec in sorted_precs:
        level_nts[prec] = _unique_nt(_level_name(prec), all_terminals)

    t_nt = _unique_nt("T", all_terminals)
    k_nt = _unique_nt("K", all_terminals)
    r_nt = _unique_nt("R", all_terminals)
    v_nt = _unique_nt("V", all_terminals)

    # top_nt is the topmost level of the hierarchy; functions recurse back here.
    top_nt = level_nts[sorted_precs[0]] if sorted_precs else t_nt

    g = cls(start=start if start is not None else top_nt)

    # Build operator-precedence chain.  Level i expands to itself OP level i+1;
    # the highest level chains to T.
    for i, prec in enumerate(sorted_precs):
        nt = level_nts[prec]
        next_nt = level_nts[sorted_precs[i + 1]] if i + 1 < len(sorted_precs) else t_nt
        ops = op_groups[prec]
        weight = _LEVEL_REC_WEIGHTS.get(prec, 0.4)
        infix = _LEVEL_RULE_INFIXES.get(prec, "op")

        for sym in ops:
            # Power is right-associative (a^b^c = a^(b^c)), so recurse on the right.
            if prec == OP_POWER:
                g.add_rule(Rule(nt, [next_nt, sym, nt], weight=weight / len(ops), name=f"{nt}_{infix}_{sym}"))
            else:
                g.add_rule(Rule(nt, [nt, sym, next_nt], weight=weight / len(ops), name=f"{nt}_{infix}_{sym}"))
        g.add_rule(Rule(nt, [next_nt], weight=1.0 - weight, name=f"{nt}_to_{next_nt}"))

    # T level — leaf dispatcher; K and V branches added only when their symbols exist.
    if variables and (consts or lits):
        g.add_rule(Rule(t_nt, [r_nt], weight=0.2, name=f"{t_nt}_to_{r_nt}"))
        g.add_rule(Rule(t_nt, [k_nt], weight=0.2, name=f"{t_nt}_to_{k_nt}"))
        g.add_rule(Rule(t_nt, [v_nt], weight=0.6, name=f"{t_nt}_to_{v_nt}"))
    elif consts or lits:
        g.add_rule(Rule(t_nt, [r_nt], weight=0.3, name=f"{t_nt}_to_{r_nt}"))
        g.add_rule(Rule(t_nt, [k_nt], weight=0.7, name=f"{t_nt}_to_{k_nt}"))
    else:
        g.add_rule(Rule(t_nt, [r_nt], weight=0.3, name=f"{t_nt}_to_{r_nt}"))
        g.add_rule(Rule(t_nt, [v_nt], weight=0.7, name=f"{t_nt}_to_{v_nt}"))

    # K level — constants and literals
    if lits and consts:
        for sym in lits:
            g.add_rule(Rule(k_nt, [sym], weight=0.2 / len(lits), name=f"{k_nt}_{sym}"))
        for sym in consts:
            g.add_rule(Rule(k_nt, [sym], weight=0.8 / len(consts), name=f"{k_nt}_{sym}"))
    elif lits:
        for sym in lits:
            g.add_rule(Rule(k_nt, [sym], weight=1.0 / len(lits), name=f"{k_nt}_{sym}"))
    elif consts:
        for sym in consts:
            g.add_rule(Rule(k_nt, [sym], weight=1.0 / len(consts), name=f"{k_nt}_{sym}"))

    # R level — prefix functions, postfix functions (top_nt sym), and parens.
    # Postfix rules live here directly (no separate P non-terminal).
    if R_fns:
        for sym in R_fns:
            g.add_rule(Rule(r_nt, [sym, "(", top_nt, ")"], weight=0.4 / len(R_fns), name=f"{r_nt}_fn_{sym}"))
        if P_fns:
            for sym in P_fns:
                g.add_rule(Rule(r_nt, [top_nt, sym], weight=0.05 / len(P_fns), name=f"{r_nt}_postfix_{sym}"))
            g.add_rule(Rule(r_nt, ["(", top_nt, ")"], weight=0.55, name=f"{r_nt}_paren"))
        else:
            g.add_rule(Rule(r_nt, ["(", top_nt, ")"], weight=0.6, name=f"{r_nt}_paren"))
    else:
        if P_fns:
            for sym in P_fns:
                g.add_rule(Rule(r_nt, [top_nt, sym], weight=0.05 / len(P_fns), name=f"{r_nt}_postfix_{sym}"))
            g.add_rule(Rule(r_nt, ["(", top_nt, ")"], weight=0.95, name=f"{r_nt}_paren"))
        else:
            g.add_rule(Rule(r_nt, ["(", top_nt, ")"], weight=1.0, name=f"{r_nt}_paren"))

    # V level — variables
    if variables:
        for sym in variables:
            g.add_rule(Rule(v_nt, [sym], weight=1.0 / len(variables), name=f"{v_nt}_{sym}"))

    return g

from_grammar_string `classmethod`

from_grammar_string(text: str, start: Optional[str] = None) -> 'Grammar'

Construct a Grammar from a string in NLTK production-rule notation.

Each non-empty, non-comment line must have the form::

LHS -> RHS_1 | RHS_2 | ...

where each alternative is a space-separated sequence of symbols. Terminals are enclosed in single quotes (e.g. '+'); unquoted tokens are treated as non-terminals. An optional weight in square brackets at the end of an alternative is parsed as a float (e.g. E '+' F [0.4]); alternatives without a weight default to 1.0.

The start symbol may be embedded as a comment header emitted by to_grammar_string::

# start: E

The explicit start parameter takes precedence over this comment. When start is None, only the first # start: comment encountered is used; subsequent ones are ignored along with all other comment lines.

Rule names are not preserved in NLTK notation and are set to None on all returned Rule objects.

Examples:

>>> g = Grammar.from_grammar_string("E -> E '+' F | F\nF -> 'x'", start="E")
>>> sorted(g.nonterminals)
['E', 'F']
>>> g.rules_for("F")[0].rhs
['x']
>>> g.start
'E'

>>> g2 = Grammar.from_grammar_string(
...     "# start: E\nE -> E '+' F [0.4] | F [0.6]\nF -> 'x' [1.0]"
... )
>>> g2.start
'E'
>>> g2.is_pcfg()
True

Parameters:

Name	Type	Description	Default
`text`	`str`	Grammar specification in NLTK notation, optionally with a `# start: <symbol>` header line.	required
`start`	`Optional[str]`	Start non-terminal stored on the returned grammar. Takes precedence over a `# start:` comment in `text`. Required when neither the parameter nor the comment is present.	`None`

Returns:

Type	Description
`'Grammar'`	A new Grammar populated with
`'Grammar'`	the parsed rules.

Raises:

Type	Description
`ValueError`	If a content line contains no `->`.
`ValueError`	If a weight token cannot be converted to `float`.
`ValueError`	If `text` contains no parseable rules.
`ValueError`	If the start symbol cannot be determined from either the parameter or the `# start:` comment.

Source code in SRToolkit/utils/grammar/grammar.py

@classmethod
def from_grammar_string(cls, text: str, start: Optional[str] = None) -> "Grammar":
    """
    Construct a [Grammar][SRToolkit.utils.grammar.Grammar] from a string in
    NLTK production-rule notation.

    Each non-empty, non-comment line must have the form::

        LHS -> RHS_1 | RHS_2 | ...

    where each alternative is a space-separated sequence of symbols.
    Terminals are enclosed in single quotes (e.g. ``'+'``); unquoted
    tokens are treated as non-terminals.  An optional weight in square
    brackets at the end of an alternative is parsed as a
    ``float`` (e.g. ``E '+' F [0.4]``); alternatives without a weight
    default to ``1.0``.

    The start symbol may be embedded as a comment header emitted by
    [to_grammar_string][SRToolkit.utils.grammar.Grammar.to_grammar_string]::

        # start: E

    The explicit ``start`` parameter takes precedence over this comment.
    When ``start`` is ``None``, only the first ``# start:`` comment encountered
    is used; subsequent ones are ignored along with all other comment lines.

    Rule names are not preserved in NLTK notation and are set to ``None``
    on all returned [Rule][SRToolkit.utils.grammar.Rule] objects.

    Examples:
        >>> g = Grammar.from_grammar_string("E -> E '+' F | F\\nF -> 'x'", start="E")
        >>> sorted(g.nonterminals)
        ['E', 'F']
        >>> g.rules_for("F")[0].rhs
        ['x']
        >>> g.start
        'E'

        >>> g2 = Grammar.from_grammar_string(
        ...     "# start: E\\nE -> E '+' F [0.4] | F [0.6]\\nF -> 'x' [1.0]"
        ... )
        >>> g2.start
        'E'
        >>> g2.is_pcfg()
        True

    Args:
        text: Grammar specification in NLTK notation, optionally with a
            ``# start: <symbol>`` header line.
        start: Start non-terminal stored on the returned grammar.  Takes
            precedence over a ``# start:`` comment in ``text``.  Required
            when neither the parameter nor the comment is present.

    Returns:
        A new [Grammar][SRToolkit.utils.grammar.Grammar] populated with
        the parsed rules.

    Raises:
        ValueError: If a content line contains no ``->``.
        ValueError: If a weight token cannot be converted to ``float``.
        ValueError: If ``text`` contains no parseable rules.
        ValueError: If the start symbol cannot be determined from either
            the parameter or the ``# start:`` comment.
    """
    resolved_start = start
    rules: list[Rule] = []
    for raw_line in text.splitlines():
        line = raw_line.strip()
        if not line:
            continue
        if line.startswith("#"):
            if resolved_start is None and line.startswith("# start:"):
                resolved_start = line[len("# start:") :].strip()
            continue
        rules.extend(Rule.from_line(line))
    if not rules:
        raise ValueError("No rules parsed from grammar string.")
    if resolved_start is None:
        raise ValueError(
            "No start symbol found. Either pass start=<nonterminal> or include a '# start: <symbol>' line."
        )
    return cls(rules, start=resolved_start)

to_grammar_string

to_grammar_string() -> str

Serialise this grammar to a string in NLTK production-rule notation.

All rules sharing the same left-hand side are written on a single line, separated by |. Symbols that are non-terminals (i.e. appear as the lhs of at least one rule) are written unquoted; all other symbols are enclosed in single quotes. When is_pcfg returns True, the probability of each alternative (its weight divided by the sum of weights for that non-terminal) is appended in square brackets.

Rule names and registered constraints are not included in the output. When self.start is set, a # start: <symbol> comment is prepended so that from_grammar_string can reconstruct the grammar without requiring an explicit start argument.

Examples:

>>> g = Grammar([
...     Rule("E", ["E", "+", "F"]),
...     Rule("E", ["F"]),
...     Rule("F", ["x"]),
... ], start="E")
>>> print(g.to_grammar_string())
# start: E
E -> E '+' F | F
F -> 'x'

>>> gp = Grammar([
...     Rule("E", ["x"], weight=1.0),
...     Rule("E", ["y"], weight=3.0),
... ])
>>> print(gp.to_grammar_string())
E -> 'x' [0.25] | 'y' [0.75]

Returns:

Type	Description
`str`	Multi-line string in NLTK production-rule notation, optionally
`str`	preceded by a `# start:` header.

Source code in SRToolkit/utils/grammar/grammar.py

def to_grammar_string(self) -> str:
    """
    Serialise this grammar to a string in NLTK production-rule notation.

    All rules sharing the same left-hand side are written on a single
    line, separated by `` | ``.  Symbols that are non-terminals (i.e.
    appear as the ``lhs`` of at least one rule) are written unquoted;
    all other symbols are enclosed in single quotes.  When
    [is_pcfg][SRToolkit.utils.grammar.Grammar.is_pcfg] returns ``True``,
    the probability of each alternative (its weight divided by the sum of
    weights for that non-terminal) is appended in square brackets.

    Rule names and registered constraints are not included in the output.
    When ``self.start`` is set, a ``# start: <symbol>`` comment is
    prepended so that
    [from_grammar_string][SRToolkit.utils.grammar.Grammar.from_grammar_string]
    can reconstruct the grammar without requiring an explicit ``start``
    argument.

    Examples:
        >>> g = Grammar([
        ...     Rule("E", ["E", "+", "F"]),
        ...     Rule("E", ["F"]),
        ...     Rule("F", ["x"]),
        ... ], start="E")
        >>> print(g.to_grammar_string())
        # start: E
        E -> E '+' F | F
        F -> 'x'

        >>> gp = Grammar([
        ...     Rule("E", ["x"], weight=1.0),
        ...     Rule("E", ["y"], weight=3.0),
        ... ])
        >>> print(gp.to_grammar_string())
        E -> 'x' [0.25] | 'y' [0.75]

    Returns:
        Multi-line string in NLTK production-rule notation, optionally
        preceded by a ``# start:`` header.
    """
    nts = self.nonterminals
    pcfg = self.is_pcfg()
    lines: list[str] = []
    if self.start is not None:
        lines.append(f"# start: {self.start}")
    for lhs, rules in self._rules_by_lhs.items():
        parts: list[str] = []
        if pcfg:
            total = sum(r.weight for r in rules)
        for rule in rules:
            rhs_tokens = [sym if sym in nts else f"'{sym}'" for sym in rule.rhs]
            alt = " ".join(rhs_tokens)
            if pcfg:
                alt += f" [{rule.weight / total}]"
            parts.append(alt)
        lines.append(f"{lhs} -> {' | '.join(parts)}")
    return "\n".join(lines)

ParseTree

ParseTree(root: ParseTreeNode)

Full derivation history of an expression under a grammar.

Unlike Node, which holds only terminal symbols for expression evaluation, a ParseTree retains every non-terminal and every production applied during the derivation.

Examples:

>>> r = Rule("E", ["x"])
>>> leaf = ParseTreeNode("x", None)
>>> root = ParseTreeNode("E", r, [leaf])
>>> pt = ParseTree(root)
>>> pt.to_token_list()
['x']
>>> pt.productions_used()
[Rule(lhs='E', rhs=['x'], weight=1.0, name=None)]

Attributes:

Name	Type	Description
`root`		The root node of the parse tree.

Source code in SRToolkit/utils/grammar/grammar.py

def __init__(self, root: ParseTreeNode) -> None:
    self.root = root

to_token_list

to_token_list() -> list[str]

Collect all terminal tokens in left-to-right order.

Examples:

>>> leaf1 = ParseTreeNode("a", None)
>>> leaf2 = ParseTreeNode("+", None)
>>> leaf3 = ParseTreeNode("b", None)
>>> root = ParseTreeNode("E", Rule("E", ["a", "+", "b"]), [leaf1, leaf2, leaf3])
>>> ParseTree(root).to_token_list()
['a', '+', 'b']

Returns:

Type	Description
`list[str]`	Flat list of terminal tokens in left-to-right order.

Source code in SRToolkit/utils/grammar/grammar.py

def to_token_list(self) -> list[str]:
    """
    Collect all terminal tokens in left-to-right order.

    Examples:
        >>> leaf1 = ParseTreeNode("a", None)
        >>> leaf2 = ParseTreeNode("+", None)
        >>> leaf3 = ParseTreeNode("b", None)
        >>> root = ParseTreeNode("E", Rule("E", ["a", "+", "b"]), [leaf1, leaf2, leaf3])
        >>> ParseTree(root).to_token_list()
        ['a', '+', 'b']

    Returns:
        Flat list of terminal tokens in left-to-right order.
    """
    tokens: list[str] = []
    self._collect_tokens(self.root, tokens)
    return tokens

productions_used

productions_used() -> list[Rule]

Return all rules applied in the derivation, in pre-order.

Examples:

>>> leaf = ParseTreeNode("x", None)
>>> r = Rule("E", ["x"])
>>> root = ParseTreeNode("E", r, [leaf])
>>> ParseTree(root).productions_used()
[Rule(lhs='E', rhs=['x'], weight=1.0, name=None)]

Returns:

Type	Description
`list[Rule]`	List of Rule objects in pre-order traversal order.

Source code in SRToolkit/utils/grammar/grammar.py

def productions_used(self) -> list[Rule]:
    """
    Return all rules applied in the derivation, in pre-order.

    Examples:
        >>> leaf = ParseTreeNode("x", None)
        >>> r = Rule("E", ["x"])
        >>> root = ParseTreeNode("E", r, [leaf])
        >>> ParseTree(root).productions_used()
        [Rule(lhs='E', rhs=['x'], weight=1.0, name=None)]

    Returns:
        List of [Rule][SRToolkit.utils.grammar.Rule] objects in pre-order traversal order.
    """
    rules: list[Rule] = []
    self._collect_rules(self.root, rules)
    return rules

ParseTreeNode `dataclass`

ParseTreeNode(symbol: str, rule_applied: Optional[Rule], children: list[ParseTreeNode] = list())

A node in a derivation parse tree.

Terminal leaves have rule_applied=None and children=[]. Internal nodes store the rule that expanded the non-terminal at this position.

Attributes:

Name	Type	Description
`symbol`	`str`	The grammar symbol at this node (terminal or non-terminal name).
`rule_applied`	`Optional[Rule]`	The Rule used to expand this node. `None` for terminal leaves.
`children`	`list[ParseTreeNode]`	Ordered child nodes corresponding to the symbols in `rule_applied.rhs`.

Rule `dataclass`

Rule(lhs: str, rhs: list[str], weight: float = 1.0, name: Optional[str] = None)

A single production rule in a grammar.

A rule is a CFG production when weights remain 1 for all rules in the grammar; when weights differ across rules, the grammar is treated as a PCFG and productions are sampled proportionally. Weights are unnormalised — the grammar normalises them at sampling time.

Examples:

>>> r = Rule("E", ["E", "+", "F"], weight=0.4, name="E_add_+")
>>> r.lhs
'E'
>>> r.rhs
['E', '+', 'F']
>>> r.weight
0.4
>>> r.name
'E_add_+'
>>> Rule("E", ["F"]).weight
1.0

Attributes:

Name	Type	Description
`lhs`	`str`	The non-terminal being expanded, e.g. `"E"`.
`rhs`	`list[str]`	Ordered sequence of symbols the non-terminal expands to. Each element is either a terminal token (e.g. `"+"` or `"sin"`) or the name of another non-terminal. A symbol is treated as a non-terminal if and only if it appears as the `lhs` of at least one rule in the grammar.
`weight`	`float`	Unnormalised sampling weight. Defaults to `1.0`, which produces uniform sampling within a group when all rules share the same weight.
`name`	`Optional[str]`	Optional stable identifier for this rule. Used by constraints for scoping and identification. `None` by default.

from_line `classmethod`

from_line(line: str) -> list['Rule']

Parse one NLTK production line into a list of Rule objects.

The line must have the form::

LHS -> RHS_1 | RHS_2 | ...

where each alternative is a space-separated sequence of symbols. Terminals are enclosed in single quotes; unquoted tokens are non-terminals. An optional weight in square brackets at the end of an alternative is parsed as a float; alternatives without a weight default to 1.0.

Examples:

>>> Rule.from_line("E -> E '+' F [0.4] | F [0.6]")
[Rule(lhs='E', rhs=['E', '+', 'F'], weight=0.4, name=None), Rule(lhs='E', rhs=['F'], weight=0.6, name=None)]
>>> Rule.from_line("F -> 'x'")
[Rule(lhs='F', rhs=['x'], weight=1.0, name=None)]

Parameters:

Name	Type	Description	Default
`line`	`str`	A single non-empty, non-comment production line.	required

Returns:

Type	Description
`list['Rule']`	List of Rule objects, one per alternative.

Raises:

Type	Description
`ValueError`	If `line` contains no `->`.
`ValueError`	If the left-hand side is empty.
`ValueError`	If a weight token cannot be converted to `float`.
`ValueError`	If no alternatives are parsed from the right-hand side.

Source code in SRToolkit/utils/grammar/grammar.py

@classmethod
def from_line(cls, line: str) -> list["Rule"]:
    """
    Parse one NLTK production line into a list of [Rule][SRToolkit.utils.grammar.Rule] objects.

    The line must have the form::

        LHS -> RHS_1 | RHS_2 | ...

    where each alternative is a space-separated sequence of symbols.
    Terminals are enclosed in single quotes; unquoted tokens are non-terminals.
    An optional weight in square brackets at the end of an alternative is parsed
    as a ``float``; alternatives without a weight default to ``1.0``.

    Examples:
        >>> Rule.from_line("E -> E '+' F [0.4] | F [0.6]")
        [Rule(lhs='E', rhs=['E', '+', 'F'], weight=0.4, name=None), Rule(lhs='E', rhs=['F'], weight=0.6, name=None)]
        >>> Rule.from_line("F -> 'x'")
        [Rule(lhs='F', rhs=['x'], weight=1.0, name=None)]

    Args:
        line: A single non-empty, non-comment production line.

    Returns:
        List of [Rule][SRToolkit.utils.grammar.Rule] objects, one per alternative.

    Raises:
        ValueError: If ``line`` contains no ``->``.
        ValueError: If the left-hand side is empty.
        ValueError: If a weight token cannot be converted to ``float``.
        ValueError: If no alternatives are parsed from the right-hand side.
    """
    line = line.strip()
    if "->" not in line:
        raise ValueError(f"Expected '->' in grammar line: {line!r}")
    lhs, rhs_part = line.split("->", 1)
    lhs = lhs.strip()
    if not lhs:
        raise ValueError(f"Empty left-hand side in grammar line: {line!r}")
    rules = []
    for alt in rhs_part.split("|"):
        alt = alt.strip()
        if not alt:
            continue
        weight = 1.0
        m = re.search(r"\[([^\]]+)\]\s*$", alt)
        if m:
            try:
                weight = float(m.group(1))
            except ValueError:
                raise ValueError(f"Invalid weight {m.group(1)!r} in grammar line: {line!r}")
            alt = alt[: m.start()].strip()
        tokens = [t[1:-1] if t.startswith("'") and t.endswith("'") else t for t in re.findall(r"'[^']*'|\S+", alt)]
        if tokens:
            rules.append(cls(lhs, tokens, weight=weight))
    if not rules:
        raise ValueError(f"No alternatives parsed from grammar line: {line!r}")
    return rules

Grammar

SRToolkit.utils.grammar

AncestorInfo dataclass

Constraint

to_dict

from_dict classmethod

initial_local

initial_global

allows

update

DimensionalConsistency

ExpansionContext dataclass

MaxDepth

MaxNodes

MaxOccurrences

NoNested

Derivation

complete property

local_stack

global_state

options

apply

sample

generate

to_token_list

to_parse_tree

Grammar

nonterminals property

add_rule

add_constraint

rules_for

is_pcfg

validate

start_derivation

generate_one

to_dict

from_dict classmethod

from_symbol_library classmethod

from_grammar_string classmethod

to_grammar_string

ParseTree

to_token_list

productions_used

ParseTreeNode dataclass

Rule dataclass

from_line classmethod

AncestorInfo `dataclass`

from_dict `classmethod`

ExpansionContext `dataclass`

complete `property`

nonterminals `property`

from_dict `classmethod`

from_symbol_library `classmethod`

from_grammar_string `classmethod`

ParseTreeNode `dataclass`

Rule `dataclass`

from_line `classmethod`