Skip to content

Grammar

SRToolkit.utils.grammar

CFG/PCFG grammar representation, constraint protocol, and stateful derivation engine.

Provides

Rule, ParseTreeNode, ParseTree, Grammar, Constraint, AncestorInfo, ExpansionContext, MaxDepth, MaxNodes, MaxOccurrences, NoNested, DimensionalConsistency, Derivation.

AncestorInfo dataclass

AncestorInfo(nonterminal: str, rule: Rule, child_index: int)

One entry in an ExpansionContext's ancestor chain.

Attributes:

Name Type Description
nonterminal str

The non-terminal symbol at this ancestor position.

rule Rule

The rule applied at this position.

child_index int

The index within rule.rhs that leads toward the current slot, counting only non-terminal positions.

Constraint

Bases: Generic[L, G]

Base class for derivation constraints (hard filters).

Subclass and override the methods you need. Constraints carry only construction-time configuration; all per-derivation state is managed by the engine and threaded through method arguments.

Scoping — set nonterminals and/or rule_names to restrict allows to a subset of slots or rules. A scope miss is treated as implicit acceptance. update is always called regardless of scope so that global counters stay accurate.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([Rule("E", ["x"]), Rule("E", ["y"])])
>>> class RejectY(Constraint):
...     def allows(self, slot, rule, global_): return rule.rhs != ["y"]
>>> g.add_constraint(RejectY())
>>> d = g.start_derivation("E")
>>> [r.rhs for r in d.options()]
[['x']]

to_dict

to_dict() -> dict

Serialise this constraint to a JSON-safe dictionary.

The base implementation stores only the fully-qualified class path under constraint_class. Subclasses should call super().to_dict() and add their own constructor arguments so that from_dict can reconstruct the instance faithfully. See from_dict for an example.

Returns:

Type Description
dict

Dictionary with at least the key constraint_class.

Source code in SRToolkit/utils/grammar/constraints.py
def to_dict(self) -> dict:
    """
    Serialise this constraint to a JSON-safe dictionary.

    The base implementation stores only the fully-qualified class path under
    ``constraint_class``. Subclasses should call ``super().to_dict()`` and
    add their own constructor arguments so that ``from_dict`` can reconstruct
    the instance faithfully. See
    [from_dict][SRToolkit.utils.grammar.constraints.Constraint.from_dict]
    for an example.

    Returns:
        Dictionary with at least the key ``constraint_class``.
    """
    return {"constraint_class": f"{self.__class__.__module__}.{self.__class__.__qualname__}"}

from_dict classmethod

from_dict(d: dict) -> 'Constraint'

Reconstruct a constraint from a dictionary produced by to_dict.

When called on the base Constraint class, dispatches to the correct subclass via the constraint_class key using importlib — both built-in and user-defined subclasses are supported. When called on a concrete subclass, the subclass must override this method.

The dictionary must contain at minimum the key constraint_class, whose value is the fully-qualified class path (e.g. "mymodule.MyConstraint"). Any additional keys are forwarded to the subclass override.

To make a custom subclass serialisable, override both to_dict and from_dict::

class MyConstraint(Constraint):
    def __init__(self, threshold: float) -> None:
        self.threshold = threshold

    def to_dict(self) -> dict:
        return {**super().to_dict(), "threshold": self.threshold}

    @classmethod
    def from_dict(cls, d: dict) -> "MyConstraint":
        return cls(d["threshold"])

Examples:

>>> from SRToolkit.utils.grammar import Constraint, MaxDepth
>>> c = Constraint.from_dict(MaxDepth(5).to_dict())
>>> isinstance(c, MaxDepth) and c.limit == 5
True

Parameters:

Name Type Description Default
d dict

Dictionary previously returned by to_dict.

required

Returns:

Type Description
'Constraint'

A reconstructed Constraint instance.

Raises:

Type Description
KeyError

If constraint_class is missing from d (dispatch path).

ImportError

If the module cannot be imported (dispatch path).

AttributeError

If the class cannot be found in the module (dispatch path).

NotImplementedError

If called on a subclass that has not overridden this method.

Source code in SRToolkit/utils/grammar/constraints.py
@classmethod
def from_dict(cls, d: dict) -> "Constraint":
    """
    Reconstruct a constraint from a dictionary produced by ``to_dict``.

    When called on the base [Constraint][SRToolkit.utils.grammar.constraints.Constraint]
    class, dispatches to the correct subclass via the ``constraint_class`` key using
    `importlib` — both built-in and user-defined subclasses are supported.
    When called on a concrete subclass, the subclass must override this method.

    The dictionary must contain at minimum the key ``constraint_class``, whose value
    is the fully-qualified class path (e.g. ``"mymodule.MyConstraint"``).  Any
    additional keys are forwarded to the subclass override.

    To make a custom subclass serialisable, override both ``to_dict`` and
    ``from_dict``::

        class MyConstraint(Constraint):
            def __init__(self, threshold: float) -> None:
                self.threshold = threshold

            def to_dict(self) -> dict:
                return {**super().to_dict(), "threshold": self.threshold}

            @classmethod
            def from_dict(cls, d: dict) -> "MyConstraint":
                return cls(d["threshold"])

    Examples:
        >>> from SRToolkit.utils.grammar import Constraint, MaxDepth
        >>> c = Constraint.from_dict(MaxDepth(5).to_dict())
        >>> isinstance(c, MaxDepth) and c.limit == 5
        True

    Args:
        d: Dictionary previously returned by ``to_dict``.

    Returns:
        A reconstructed [Constraint][SRToolkit.utils.grammar.constraints.Constraint] instance.

    Raises:
        KeyError: If ``constraint_class`` is missing from ``d`` (dispatch path).
        ImportError: If the module cannot be imported (dispatch path).
        AttributeError: If the class cannot be found in the module (dispatch path).
        NotImplementedError: If called on a subclass that has not overridden this method.
    """
    if cls is Constraint:
        class_path = d["constraint_class"]
        module_path, cls_name = class_path.rsplit(".", 1)
        resolved = getattr(importlib.import_module(module_path), cls_name)
        return resolved.from_dict(d)
    raise NotImplementedError(f"{cls.__name__}.from_dict is not implemented.")

initial_local

initial_local(start: str) -> L

Return the initial local state for the constraint.

Source code in SRToolkit/utils/grammar/constraints.py
def initial_local(self, start: str) -> L:
    """Return the initial local state for the constraint."""
    return None  # type: ignore[return-value]

initial_global

initial_global() -> G

Return the initial global state for the constraint.

Source code in SRToolkit/utils/grammar/constraints.py
def initial_global(self) -> G:
    """Return the initial global state for the constraint."""
    return None  # type: ignore[return-value]

allows

allows(slot: ExpansionContext[L], rule: Rule, global_: G) -> bool

Return True if rule may be applied at slot.

Called only when the slot's non-terminal and rule name are within scope.

Parameters:

Name Type Description Default
slot ExpansionContext[L]

Current derivation position with this constraint's local state.

required
rule Rule

Candidate production rule.

required
global_ G

Current per-derivation global state.

required

Returns:

Type Description
bool

True to keep the rule in the candidate set.

Source code in SRToolkit/utils/grammar/constraints.py
def allows(self, slot: ExpansionContext[L], rule: Rule, global_: G) -> bool:
    """
    Return ``True`` if ``rule`` may be applied at ``slot``.

    Called only when the slot's non-terminal and rule name are within scope.

    Args:
        slot: Current derivation position with this constraint's local state.
        rule: Candidate production rule.
        global_: Current per-derivation global state.

    Returns:
        ``True`` to keep the rule in the candidate set.
    """
    return True

update

update(slot: ExpansionContext[L], rule: Rule, global_: G) -> tuple[list[L], G]

Called after rule is applied at slot.

Returns per-child local states (one per non-terminal in rule.rhs, in order) and the new global state. May be used to update global counters by returning a new global_ value.

Parameters:

Name Type Description Default
slot ExpansionContext[L]

Derivation position immediately before the rule was applied.

required
rule Rule

The rule that was applied.

required
global_ G

Global state before this application.

required

Returns:

Type Description
list[L]

(child_locals, new_global) where len(child_locals) equals

G

the number of non-terminals in rule.rhs.

Source code in SRToolkit/utils/grammar/constraints.py
def update(self, slot: ExpansionContext[L], rule: Rule, global_: G) -> tuple[list[L], G]:
    """
    Called after ``rule`` is applied at ``slot``.

    Returns per-child local states (one per non-terminal in ``rule.rhs``,
    in order) and the new global state. May be used to update global
    counters by returning a new ``global_`` value.

    Args:
        slot: Derivation position immediately before the rule was applied.
        rule: The rule that was applied.
        global_: Global state before this application.

    Returns:
        ``(child_locals, new_global)`` where ``len(child_locals)`` equals
        the number of non-terminals in ``rule.rhs``.
    """
    n = sum(1 for s in rule.rhs if s in slot.nonterminals)
    return [slot.local] * n, global_

DimensionalConsistency

DimensionalConsistency(variable_units: dict[str, dict], target_unit: dict, constant_units: Optional[dict[str, dict]] = None, symbol_library: Optional[SymbolLibrary] = None, allow_unit_polymorphic_constants: bool = False)

Bases: Constraint[Optional[Unit], None]

Stateful constraint that enforces dimensional analysis during expression generation.

Local state at each slot is the required unit (Optional[Unit]). None means "any unit is acceptable here" — used for the children of multiplicative operators, whose individual units are underdetermined.

Unit representation: units are dict[str, Fraction] mapping base dimension names (e.g. "m", "s", "kg") to rational exponents. Dimensionless is {} (empty dict).

Free constants (const type): treated as dimensionless by default. Set allow_unit_polymorphic_constants=True to let them absorb any required unit.

Named physical constants (lit type, e.g. "g", "c"): declared via constant_units; checked like variables.

Undeclared variables or literals: conservatively accepted.

Rule classification: uses rule.name when available (see Grammar.from_symbol_library for the naming scheme). Falls back to rhs-shape heuristics for unnamed rules.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> from SRToolkit.utils.grammar import DimensionalConsistency
>>> from fractions import Fraction
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["E", "+", "F"], weight=0.6, name="E_add_+"))
>>> g.add_rule(Rule("E", ["F"], weight=0.4, name="E_to_F"))
>>> g.add_rule(Rule("F", ["v"], weight=0.5, name="F_v"))
>>> g.add_rule(Rule("F", ["t"], weight=0.5, name="F_t"))
>>> dc = DimensionalConsistency(
...     variable_units={"v": {"m": 1, "s": -1}, "t": {"s": 1}},
...     target_unit={"m": 1, "s": -1},
... )
>>> g.add_constraint(dc)
>>> d = g.start_derivation("E")
>>> all(r.rhs != ["t"] for r in d.options())
True

Parameters:

Name Type Description Default
variable_units dict[str, dict]

Mapping from variable token to its unit.

required
target_unit dict

Required unit of the generated expression.

required
constant_units Optional[dict[str, dict]]

Units for named physical constants (lit type).

None
symbol_library Optional[SymbolLibrary]

Used to classify tokens by type and precedence.

None
allow_unit_polymorphic_constants bool

If True, free constants absorb whatever unit their slot requires. Default False.

False
Source code in SRToolkit/utils/grammar/constraints.py
def __init__(
    self,
    variable_units: dict[str, dict],
    target_unit: dict,
    constant_units: Optional[dict[str, dict]] = None,
    symbol_library: Optional[SymbolLibrary] = None,
    allow_unit_polymorphic_constants: bool = False,
) -> None:
    self._var_units: dict[str, Unit] = {k: _to_unit(v) for k, v in variable_units.items()}
    self._target: Unit = _to_unit(target_unit)
    self._const_units: dict[str, Unit] = {k: _to_unit(v) for k, v in (constant_units or {}).items()}
    self._sl = symbol_library
    self._allow_poly_const = allow_unit_polymorphic_constants

ExpansionContext dataclass

ExpansionContext(nonterminal: str, local: L, steps: int, parent_rule: Optional[Rule], child_index: Optional[int], ancestors: tuple[AncestorInfo, ...], partial_tree: ParseTreeNode, nonterminals: frozenset[str], frontier_size: int = 1)

Bases: Generic[L]

Read-only view of the current derivation position passed to constraints.

The engine constructs an ExpansionContext on demand for each (constraint, candidate_rule) pair evaluated in Derivation.options and for each (constraint, selected_rule) pair in Derivation.apply.

Attributes:

Name Type Description
nonterminal str

The non-terminal symbol being expanded at this position.

local L

This constraint's per-slot inherited state.

steps int

Number of rule applications made so far in the derivation (i.e. how many times apply() has been called).

parent_rule Optional[Rule]

The rule whose application created this slot (None for the start symbol).

child_index Optional[int]

Index of this slot in parent_rule.rhs counting only non-terminal positions (None for the start symbol).

ancestors tuple[AncestorInfo, ...]

Ancestor chain from the root to this slot's immediate parent, in root-first order.

partial_tree ParseTreeNode

Root of the partially-built parse tree. Read-only by contract — constraints must not mutate it.

nonterminals frozenset[str]

Frozen set of all non-terminal symbols in the grammar.

frontier_size int

Number of open (unexpanded) non-terminal slots in the current derivation frontier, including this slot. Useful for budget constraints such as MaxNodes that need to account for the minimum number of rule applications still required to complete the derivation.

MaxDepth

MaxDepth(limit: int)

Bases: Constraint[int, None]

Hard limit on derivation depth.

Local state is the remaining depth budget at each slot. A rule is rejected when its application would require at least one recursive non-terminal child but the budget has reached zero.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
...     Rule("E", ["E", "+", "E"]),
...     Rule("E", ["x"]),
... ])
>>> g.add_constraint(MaxDepth(0))
>>> d = g.start_derivation("E")
>>> all(r.rhs == ["x"] for r in d.options())
True

Parameters:

Name Type Description Default
limit int

Maximum nesting depth (number of non-terminal expansion levels).

required
Source code in SRToolkit/utils/grammar/constraints.py
def __init__(self, limit: int) -> None:
    self.limit = limit

MaxNodes

MaxNodes(limit: int)

Bases: Constraint[None, int]

Hard limit on the total number of rule applications in a derivation.

Global state is a running count of applications so far. A rule is rejected when applying it would push the count over the limit taking into account the non-terminal children it introduces (each of which will require at least one more application).

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
...     Rule("E", ["E", "+", "E"]),
...     Rule("E", ["x"]),
... ])
>>> g.add_constraint(MaxNodes(3))
>>> d = g.start_derivation("E")
>>> # At node-count 3 only the terminal rule survives
>>> tokens = d.generate()
>>> len(tokens) <= 5
True

Parameters:

Name Type Description Default
limit int

Maximum number of rule applications.

required
Source code in SRToolkit/utils/grammar/constraints.py
def __init__(self, limit: int) -> None:
    self.limit = limit

MaxOccurrences

MaxOccurrences(symbol: str, limit: int)

Bases: Constraint[None, int]

Hard limit on how many times a specific terminal symbol may appear.

Global state counts occurrences committed so far. A rule whose rhs contains the tracked symbol is rejected once the count equals the limit.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
...     Rule("E", ["E", "+", "E"]),
...     Rule("E", ["x"]),
...     Rule("E", ["y"]),
... ])
>>> g.add_constraint(MaxOccurrences("x", 1))
>>> d = g.start_derivation("E")
>>> tokens = d.generate(limit=200)
>>> tokens.count("x") <= 1
True

Parameters:

Name Type Description Default
symbol str

Terminal token to track.

required
limit int

Maximum allowed occurrences.

required
Source code in SRToolkit/utils/grammar/constraints.py
def __init__(self, symbol: str, limit: int) -> None:
    self.symbol = symbol
    self.limit = limit

NoNested

NoNested(symbols: str | Iterable[str])

Bases: Constraint[bool, None]

Prevent any symbol in a group from appearing nested inside any other symbol in the same group.

Local state is a boolean "currently under any symbol in the group". Any rule whose rhs contains a group symbol is rejected while the local state is True. Children inherit True once such a rule is applied.

Pass a single string to forbid self-nesting only; pass multiple symbols to forbid cross-nesting within the group (e.g. NoNested(TRIG_FNS) prevents sin(cos(x)) as well as sin(sin(x))).

Warning

This constraint propagates the "inside" flag to all non-terminal children of any rule whose rhs contains a group symbol. It works correctly when each group symbol appears in a rule with exactly one non-terminal child (typical prefix-function rules such as E -> sin ( E )) and for infix operators where every child should be blocked (e.g. E -> E + E). It does not work correctly for mixed rules that combine a function call with additional non-terminal siblings in a single production (e.g. E -> sin ( E ) + F): the sibling F will be incorrectly treated as inside the group. If your grammar uses such rules, implement a custom Constraint instead.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
...     Rule("E", ["sin", "(", "E", ")"]),
...     Rule("E", ["cos", "(", "E", ")"]),
...     Rule("E", ["x"]),
... ])
>>> g.add_constraint(NoNested(["sin", "cos"]))
>>> d = g.start_derivation("E")
>>> d.apply(g.rules_for("E")[0])   # apply sin(E)
>>> # Now inside sin; both sin and cos rules should be gone
>>> opts = d.options()
>>> all("sin" not in r.rhs and "cos" not in r.rhs for r in opts)
True

Parameters:

Name Type Description Default
symbols str | Iterable[str]

A single terminal token or an iterable of terminal tokens that form the nesting group.

required
Source code in SRToolkit/utils/grammar/constraints.py
def __init__(self, symbols: str | Iterable[str]) -> None:
    if isinstance(symbols, str):
        self.symbols: frozenset[str] = frozenset({symbols})
    else:
        self.symbols = frozenset(symbols)

Derivation

Derivation(grammar: Grammar, start: str)

Stateful leftmost derivation over a Grammar.

A derivation begins at a start non-terminal and proceeds by repeatedly choosing a rule to apply to the leftmost unexpanded non-terminal. options returns candidate rules filtered by all registered constraints; apply advances the derivation by one production.

Per-slot local state and per-derivation global state for each registered constraint are maintained internally; constraint instances carry only construction-time configuration and are safe to share across parallel derivations.

Obtain a Derivation via Grammar.start_derivation.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.complete
False
>>> opts = d.options()
>>> len(opts)
1
>>> d.apply(opts[0])
>>> d.complete
True
>>> d.to_token_list()
['x']
Source code in SRToolkit/utils/grammar/derivation.py
def __init__(self, grammar: Grammar, start: str) -> None:
    self._grammar = grammar
    self._nonterminals: frozenset[str] = frozenset(grammar.nonterminals)
    self._steps: int = 0
    self._root: ParseTreeNode = ParseTreeNode(start, None)

    initial_frame = _Frame(
        nonterminal=start,
        node=self._root,
        ancestors=(),
        parent_rule=None,
        child_index=None,
    )
    self._globals: dict[int, Any] = {}
    for c in grammar._constraints:
        cid = id(c)
        initial_frame.locals[cid] = c.initial_local(start)
        self._globals[cid] = c.initial_global()
    # _frames[-1] is the current leftmost open slot.
    self._frames: list[_Frame] = [initial_frame]

complete property

complete: bool

True when no unexpanded non-terminals remain.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.complete
False
>>> d.apply(d.options()[0])
>>> d.complete
True

local_stack

local_stack(constraint: Constraint) -> list[Any]

Return the local state stack for constraint across the open frontier, leftmost slot first.

Parameters:

Name Type Description Default
constraint Constraint

A constraint previously registered on the grammar.

required

Returns:

Type Description
list[Any]

One entry per open non-terminal in left-to-right order.

Source code in SRToolkit/utils/grammar/derivation.py
def local_stack(self, constraint: Constraint) -> list[Any]:
    """
    Return the local state stack for ``constraint`` across the open frontier,
    leftmost slot first.

    Args:
        constraint: A constraint previously registered on the grammar.

    Returns:
        One entry per open non-terminal in left-to-right order.
    """
    cid = id(constraint)
    return [f.locals[cid] for f in reversed(self._frames)]

global_state

global_state(constraint: Constraint) -> Any

Return the per-derivation global state for constraint.

Parameters:

Name Type Description Default
constraint Constraint

A constraint previously registered on the grammar.

required

Returns:

Type Description
Any

The current global value owned by this derivation for constraint.

Source code in SRToolkit/utils/grammar/derivation.py
def global_state(self, constraint: Constraint) -> Any:
    """
    Return the per-derivation global state for ``constraint``.

    Args:
        constraint: A constraint previously registered on the grammar.

    Returns:
        The current global value owned by this derivation for ``constraint``.
    """
    return self._globals[id(constraint)]

options

options() -> list[Rule]

Return candidate rules for the current leftmost unexpanded non-terminal, filtered by every registered constraint's allows.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> g.add_rule(Rule("E", ["y"]))
>>> d = g.start_derivation("E")
>>> len(d.options())
2

Returns:

Type Description
list[Rule]

List of Rule objects that every

list[Rule]

constraint accepts.

Raises:

Type Description
RuntimeError

If the derivation is already complete.

Source code in SRToolkit/utils/grammar/derivation.py
def options(self) -> list[Rule]:
    """
    Return candidate rules for the current leftmost unexpanded
    non-terminal, filtered by every registered constraint's
    [allows][SRToolkit.utils.grammar.constraints.Constraint.allows].

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.add_rule(Rule("E", ["y"]))
        >>> d = g.start_derivation("E")
        >>> len(d.options())
        2

    Returns:
        List of [Rule][SRToolkit.utils.grammar.Rule] objects that every
        constraint accepts.

    Raises:
        RuntimeError: If the derivation is already complete.
    """
    if self.complete:
        raise RuntimeError("Derivation is already complete; no options remain.")

    top = self._frames[-1]
    candidates = self._grammar.rules_for(top.nonterminal)

    for c in self._grammar._constraints:
        cid = id(c)
        slot = self._slot_for(top, top.locals[cid])
        global_ = self._globals[cid]
        surviving: list[Rule] = []
        for rule in candidates:
            if _scope_miss(c, top.nonterminal, rule):
                surviving.append(rule)
                continue
            if c.allows(slot, rule, global_):
                surviving.append(rule)
        candidates = surviving
        if not candidates:
            break

    return candidates

apply

apply(rule: Rule) -> None

Apply rule to the current leftmost unexpanded non-terminal.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["E", "+", "F"]))
>>> g.add_rule(Rule("E", ["x"]))
>>> g.add_rule(Rule("F", ["y"]))
>>> d = g.start_derivation("E")
>>> d.apply(g.rules_for("E")[0])   # E -> E + F
>>> d.apply(g.rules_for("E")[1])   # E -> x
>>> d.apply(g.rules_for("F")[0])   # F -> y
>>> d.to_token_list()
['x', '+', 'y']

Parameters:

Name Type Description Default
rule Rule

A rule whose lhs matches the current leftmost non-terminal.

required

Raises:

Type Description
RuntimeError

If the derivation is already complete.

ValueError

If rule.lhs does not match the current non-terminal.

Source code in SRToolkit/utils/grammar/derivation.py
def apply(self, rule: Rule) -> None:
    """
    Apply ``rule`` to the current leftmost unexpanded non-terminal.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["E", "+", "F"]))
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.add_rule(Rule("F", ["y"]))
        >>> d = g.start_derivation("E")
        >>> d.apply(g.rules_for("E")[0])   # E -> E + F
        >>> d.apply(g.rules_for("E")[1])   # E -> x
        >>> d.apply(g.rules_for("F")[0])   # F -> y
        >>> d.to_token_list()
        ['x', '+', 'y']

    Args:
        rule: A rule whose ``lhs`` matches the current leftmost
            non-terminal.

    Raises:
        RuntimeError: If the derivation is already complete.
        ValueError: If ``rule.lhs`` does not match the current
            non-terminal.
    """
    if self.complete:
        raise RuntimeError("Derivation is already complete.")

    top = self._frames[-1]
    if rule.lhs != top.nonterminal:
        raise ValueError(f"Rule lhs '{rule.lhs}' does not match current non-terminal '{top.nonterminal}'.")

    # Build child parse-tree nodes and the new frames for each NT child.
    child_frames: list[_Frame] = []
    nt_child_index = 0
    for sym in rule.rhs:
        child_node = ParseTreeNode(sym, None)
        top.node.children.append(child_node)
        if sym in self._nonterminals:
            frame = AncestorInfo(top.nonterminal, rule, nt_child_index)
            child_frames.append(
                _Frame(
                    nonterminal=sym,
                    node=child_node,
                    ancestors=top.ancestors + (frame,),
                    parent_rule=rule,
                    child_index=nt_child_index,
                )
            )
            nt_child_index += 1

    top.node.rule_applied = rule
    n_nt_children = len(child_frames)

    # Thread state through every constraint using the pre-apply slot view.
    pre_frontier_size = len(self._frames)
    for c in self._grammar._constraints:
        cid = id(c)
        slot = self._slot_for(top, top.locals[cid], frontier_size=pre_frontier_size)
        child_locals, new_global = c.update(slot, rule, self._globals[cid])
        if len(child_locals) != n_nt_children:
            raise RuntimeError(
                f"Constraint {c!r}.update() returned {len(child_locals)} child locals "
                f"but rule '{rule.lhs} -> {rule.rhs}' has {n_nt_children} "
                f"non-terminal children."
            )
        for i, cf in enumerate(child_frames):
            cf.locals[cid] = child_locals[i]
        self._globals[cid] = new_global

    # Pop the top frame and push children so that the leftmost child ends
    # up at the end of the list (i.e. on top of the stack).
    self._frames.pop()
    for cf in reversed(child_frames):
        self._frames.append(cf)
    self._steps += 1

sample

sample() -> None

Apply one rule chosen proportionally by weight (PCFG) or uniformly (CFG) from the surviving candidates.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.sample()
>>> d.complete
True

Raises:

Type Description
RuntimeError

If the derivation is already complete.

RuntimeError

If all candidate rules are filtered out by allows.

Source code in SRToolkit/utils/grammar/derivation.py
def sample(self) -> None:
    """
    Apply one rule chosen proportionally by weight (PCFG) or uniformly
    (CFG) from the surviving candidates.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> d = g.start_derivation("E")
        >>> d.sample()
        >>> d.complete
        True

    Raises:
        RuntimeError: If the derivation is already complete.
        RuntimeError: If all candidate rules are filtered out by
            [allows][SRToolkit.utils.grammar.constraints.Constraint.allows].
    """
    candidates = self.options()
    if not candidates:
        raise RuntimeError(
            "No valid rules available for the current non-terminal after applying constraint filters."
        )

    weights = [float(r.weight) for r in candidates]
    total = sum(weights)
    probs = [w / total for w in weights]
    self.apply(candidates[np.random.choice(len(candidates), p=probs)])

generate

generate(limit: int = 1000) -> list[str]

Run the derivation to completion and return the token list.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> g.start_derivation("E").generate()
['x']

Parameters:

Name Type Description Default
limit int

Maximum number of rule applications. A negative value means unlimited. Default 1000.

1000

Returns:

Type Description
list[str]

Flat list of terminal tokens in left-to-right order.

Raises:

Type Description
RuntimeError

If the derivation does not complete within limit steps (only when limit >= 0).

Source code in SRToolkit/utils/grammar/derivation.py
def generate(self, limit: int = 1000) -> list[str]:
    """
    Run the derivation to completion and return the token list.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.start_derivation("E").generate()
        ['x']

    Args:
        limit: Maximum number of rule applications. A negative value means
            unlimited. Default ``1000``.

    Returns:
        Flat list of terminal tokens in left-to-right order.

    Raises:
        RuntimeError: If the derivation does not complete within
            ``limit`` steps (only when ``limit >= 0``).
    """
    steps = 0
    while not self.complete:
        if steps >= limit >= 0:
            raise RuntimeError(f"Derivation did not complete within {limit} rule applications.")
        self.sample()
        steps += 1
    return self.to_token_list()

to_token_list

to_token_list() -> list[str]

Return the completed expression as a flat token list.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.apply(d.options()[0])
>>> d.to_token_list()
['x']

Returns:

Type Description
list[str]

Flat list of terminal tokens in left-to-right order.

Raises:

Type Description
RuntimeError

If the derivation is not yet complete.

Source code in SRToolkit/utils/grammar/derivation.py
def to_token_list(self) -> list[str]:
    """
    Return the completed expression as a flat token list.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> d = g.start_derivation("E")
        >>> d.apply(d.options()[0])
        >>> d.to_token_list()
        ['x']

    Returns:
        Flat list of terminal tokens in left-to-right order.

    Raises:
        RuntimeError: If the derivation is not yet complete.
    """
    if not self.complete:
        raise RuntimeError("Derivation is not yet complete.")
    return self.to_parse_tree().to_token_list()

to_parse_tree

to_parse_tree() -> ParseTree

Return the completed derivation as a ParseTree.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.apply(d.options()[0])
>>> isinstance(d.to_parse_tree(), ParseTree)
True

Returns:

Type Description
ParseTree

The ParseTree rooted at the

ParseTree

start symbol.

Raises:

Type Description
RuntimeError

If the derivation is not yet complete.

Source code in SRToolkit/utils/grammar/derivation.py
def to_parse_tree(self) -> ParseTree:
    """
    Return the completed derivation as a
    [ParseTree][SRToolkit.utils.grammar.ParseTree].

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> d = g.start_derivation("E")
        >>> d.apply(d.options()[0])
        >>> isinstance(d.to_parse_tree(), ParseTree)
        True

    Returns:
        The [ParseTree][SRToolkit.utils.grammar.ParseTree] rooted at the
        start symbol.

    Raises:
        RuntimeError: If the derivation is not yet complete.
    """
    if not self.complete:
        raise RuntimeError("Derivation is not yet complete.")
    return ParseTree(self._root)

Grammar

Grammar(rules: Optional[list[Rule]] = None, start: Optional[str] = None)

A context-free grammar (CFG) or probabilistic context-free grammar (PCFG).

Rules are added via add_rule. The set of non-terminals is derived automatically: a symbol is a non-terminal if and only if it appears as the lhs of at least one rule.

A grammar is a CFG when every rule carries the default weight (1.0), making sampling uniform within each group. It is a PCFG when any rule has a weight that differs from 1.0.

Constraints are registered via add_constraint. During a derivation, options returns only rules that every constraint's allows accepts.

Examples:

>>> g = Grammar([
...     Rule("E", ["E", "+", "F"], weight=0.4),
...     Rule("E", ["F"], weight=0.6),
...     Rule("F", ["x"]),
... ])
>>> "E" in g.nonterminals
True
>>> g.is_pcfg()
True
>>> len(g.rules_for("E"))
2

Parameters:

Name Type Description Default
rules Optional[list[Rule]]

Optional list of Rule objects to add at construction time. Equivalent to calling add_rule for each entry.

None
start Optional[str]

Default start non-terminal used by start_derivation when no start argument is given.

None
Source code in SRToolkit/utils/grammar/grammar.py
def __init__(self, rules: Optional[list[Rule]] = None, start: Optional[str] = None) -> None:
    """
    Args:
        rules: Optional list of [Rule][SRToolkit.utils.grammar.Rule] objects to add at
            construction time. Equivalent to calling
            [add_rule][SRToolkit.utils.grammar.Grammar.add_rule] for each entry.
        start: Default start non-terminal used by
            [start_derivation][SRToolkit.utils.grammar.Grammar.start_derivation]
            when no ``start`` argument is given.
    """
    self.start: Optional[str] = start
    self._rules: list[Rule] = []
    self._rules_by_lhs: dict[str, list[Rule]] = {}
    self._constraints: list = []
    for rule in rules or []:
        self.add_rule(rule)

nonterminals property

nonterminals: set[str]

Set of all non-terminal symbols in the grammar.

A symbol is a non-terminal if and only if it appears as the lhs of at least one rule.

Examples:

>>> g = Grammar()
>>> g.add_rule(Rule("E", ["F"]))
>>> g.add_rule(Rule("F", ["x"]))
>>> g.nonterminals == {"E", "F"}
True

add_rule

add_rule(rule: Rule) -> None

Add a production rule to the grammar.

Examples:

>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> len(g.rules_for("E"))
1

Parameters:

Name Type Description Default
rule Rule

The Rule to register.

required
Source code in SRToolkit/utils/grammar/grammar.py
def add_rule(self, rule: Rule) -> None:
    """
    Add a production rule to the grammar.

    Examples:
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> len(g.rules_for("E"))
        1

    Args:
        rule: The [Rule][SRToolkit.utils.grammar.Rule] to register.
    """
    self._rules.append(rule)
    self._rules_by_lhs.setdefault(rule.lhs, []).append(rule)

add_constraint

add_constraint(constraint: 'Constraint') -> None

Register a constraint applied at each derivation step.

A rule is offered as an option only when every registered constraint's allows returns True for it.

Examples:

>>> from SRToolkit.utils.grammar import MaxDepth
>>> g = Grammar([Rule("E", ["E", "+", "E"]), Rule("E", ["x"])])
>>> g.add_constraint(MaxDepth(0))
>>> d = g.start_derivation("E")
>>> [r.rhs for r in d.options()]
[['x']]

Parameters:

Name Type Description Default
constraint 'Constraint'

A Constraint instance — typically a built-in such as MaxDepth, MaxNodes, MaxOccurrences, NoNested, or DimensionalConsistency, or a user-defined subclass.

required
Source code in SRToolkit/utils/grammar/grammar.py
def add_constraint(self, constraint: "Constraint") -> None:
    """
    Register a [constraint][SRToolkit.utils.grammar.constraints.Constraint] applied at each derivation step.

    A rule is offered as an option only when every registered constraint's
    [allows][SRToolkit.utils.grammar.constraints.Constraint.allows] returns ``True``
    for it.

    Examples:
        >>> from SRToolkit.utils.grammar import MaxDepth
        >>> g = Grammar([Rule("E", ["E", "+", "E"]), Rule("E", ["x"])])
        >>> g.add_constraint(MaxDepth(0))
        >>> d = g.start_derivation("E")
        >>> [r.rhs for r in d.options()]
        [['x']]

    Args:
        constraint: A [Constraint][SRToolkit.utils.grammar.constraints.Constraint]
            instance — typically a built-in such as
            [MaxDepth][SRToolkit.utils.grammar.constraints.MaxDepth],
            [MaxNodes][SRToolkit.utils.grammar.constraints.MaxNodes],
            [MaxOccurrences][SRToolkit.utils.grammar.constraints.MaxOccurrences],
            [NoNested][SRToolkit.utils.grammar.constraints.NoNested], or
            [DimensionalConsistency][SRToolkit.utils.grammar.constraints.DimensionalConsistency],
            or a user-defined subclass.
    """
    self._constraints.append(constraint)

rules_for

rules_for(nonterminal: str) -> list[Rule]

Return all rules whose lhs matches nonterminal.

Examples:

>>> g = Grammar()
>>> g.add_rule(Rule("E", ["F"]))
>>> g.add_rule(Rule("E", ["x"]))
>>> len(g.rules_for("E"))
2
>>> g.rules_for("Z")
[]

Parameters:

Name Type Description Default
nonterminal str

Non-terminal to look up.

required

Returns:

Type Description
list[Rule]

List of matching Rule objects in insertion order.

Source code in SRToolkit/utils/grammar/grammar.py
def rules_for(self, nonterminal: str) -> list[Rule]:
    """
    Return all rules whose ``lhs`` matches ``nonterminal``.

    Examples:
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["F"]))
        >>> g.add_rule(Rule("E", ["x"]))
        >>> len(g.rules_for("E"))
        2
        >>> g.rules_for("Z")
        []

    Args:
        nonterminal: Non-terminal to look up.

    Returns:
        List of matching [Rule][SRToolkit.utils.grammar.Rule] objects in insertion order.
    """
    return list(self._rules_by_lhs.get(nonterminal, []))

is_pcfg

is_pcfg() -> bool

Return True if any rule deviates from the default weight of 1.0.

Examples:

>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> g.add_rule(Rule("E", ["y"]))
>>> g.is_pcfg()
False
>>> g.add_rule(Rule("F", ["z"], weight=0.3))
>>> g.is_pcfg()
True

Returns:

Type Description
bool

True if at least one rule has a weight other than 1.0.

Source code in SRToolkit/utils/grammar/grammar.py
def is_pcfg(self) -> bool:
    """
    Return ``True`` if any rule deviates from the default weight of ``1.0``.

    Examples:
        >>> g = Grammar()
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.add_rule(Rule("E", ["y"]))
        >>> g.is_pcfg()
        False
        >>> g.add_rule(Rule("F", ["z"], weight=0.3))
        >>> g.is_pcfg()
        True

    Returns:
        ``True`` if at least one rule has a weight other than ``1.0``.
    """
    return any(r.weight != 1.0 for r in self._rules)

validate

validate(parse_tree: ParseTree, require_start: bool = False) -> bool

Validate that parse_tree is structurally valid, uses only productions from this grammar, and satisfies every registered constraint.

By default the root of the tree may be any non-terminal in the grammar, not just self.start. This lets you validate sub-trees (e.g. an F-rooted fragment) in addition to full expressions. Pass require_start=True to additionally enforce that the root symbol equals self.start (useful when validating complete, top-level expressions).

The check replays the parse tree through a fresh Derivation rooted at parse_tree.root.symbol, walking nodes in leftmost order (mirroring the derivation frontier). At each internal node it checks that:

  • The applied rule is permitted at the current frontier position — meaning it exists in the grammar and every registered constraint accepts it.
  • The number of children equals len(rule.rhs).
  • Each children[i].symbol equals rule.rhs[i].

Terminal leaves are accepted when their symbol is not a non-terminal in this grammar. A non-terminal symbol appearing as a leaf (unexpanded) is rejected.

Examples:

>>> g = Grammar(start="E")
>>> r = Rule("E", ["x"])
>>> g.add_rule(r)
>>> leaf = ParseTreeNode("x", None)
>>> root = ParseTreeNode("E", r, [leaf])
>>> g.validate(ParseTree(root))
True
>>> foreign = Rule("E", ["y"])
>>> root2 = ParseTreeNode("E", foreign, [ParseTreeNode("y", None)])
>>> g.validate(ParseTree(root2))
False
>>> g.add_rule(Rule("F", ["x"]))
>>> r_f = Rule("F", ["x"])
>>> g.add_rule(r_f)
>>> f_root = ParseTreeNode("F", r_f, [ParseTreeNode("x", None)])
>>> g.validate(ParseTree(f_root))
True
>>> g.validate(ParseTree(f_root), require_start=True)
False

Parameters:

Name Type Description Default
parse_tree ParseTree

The ParseTree to validate.

required
require_start bool

When True, return False if the root symbol does not equal self.start. Raises ValueError if self.start is None and require_start is True.

False

Returns:

Type Description
bool

True if the tree is structurally consistent, every production exists

bool

in this grammar, and all constraints would have permitted the derivation

bool

(and the root matches self.start when require_start=True).

Raises:

Type Description
ValueError

If require_start=True but self.start is None.

Source code in SRToolkit/utils/grammar/grammar.py
def validate(self, parse_tree: ParseTree, require_start: bool = False) -> bool:
    """
    Validate that ``parse_tree`` is structurally valid, uses only productions
    from this grammar, and satisfies every registered constraint.

    By default the root of the tree may be **any non-terminal** in the grammar,
    not just ``self.start``.  This lets you validate sub-trees (e.g. an
    ``F``-rooted fragment) in addition to full expressions.  Pass
    ``require_start=True`` to additionally enforce that the root symbol equals
    ``self.start`` (useful when validating complete, top-level expressions).

    The check replays the parse tree through a fresh
    [Derivation][SRToolkit.utils.grammar.derivation.Derivation] rooted at
    ``parse_tree.root.symbol``, walking nodes in leftmost order (mirroring the
    derivation frontier).  At each internal node it checks that:

    - The applied rule is permitted at the current frontier position — meaning
      it exists in the grammar **and** every registered constraint accepts it.
    - The number of children equals ``len(rule.rhs)``.
    - Each ``children[i].symbol`` equals ``rule.rhs[i]``.

    Terminal leaves are accepted when their symbol is not a non-terminal in
    this grammar.  A non-terminal symbol appearing as a leaf (unexpanded) is
    rejected.

    Examples:
        >>> g = Grammar(start="E")
        >>> r = Rule("E", ["x"])
        >>> g.add_rule(r)
        >>> leaf = ParseTreeNode("x", None)
        >>> root = ParseTreeNode("E", r, [leaf])
        >>> g.validate(ParseTree(root))
        True
        >>> foreign = Rule("E", ["y"])
        >>> root2 = ParseTreeNode("E", foreign, [ParseTreeNode("y", None)])
        >>> g.validate(ParseTree(root2))
        False
        >>> g.add_rule(Rule("F", ["x"]))
        >>> r_f = Rule("F", ["x"])
        >>> g.add_rule(r_f)
        >>> f_root = ParseTreeNode("F", r_f, [ParseTreeNode("x", None)])
        >>> g.validate(ParseTree(f_root))
        True
        >>> g.validate(ParseTree(f_root), require_start=True)
        False

    Args:
        parse_tree: The [ParseTree][SRToolkit.utils.grammar.ParseTree] to validate.
        require_start: When ``True``, return ``False`` if the root symbol does
            not equal ``self.start``.  Raises ``ValueError`` if ``self.start``
            is ``None`` and ``require_start`` is ``True``.

    Returns:
        ``True`` if the tree is structurally consistent, every production exists
        in this grammar, and all constraints would have permitted the derivation
        (and the root matches ``self.start`` when ``require_start=True``).

    Raises:
        ValueError: If ``require_start=True`` but ``self.start`` is ``None``.
    """
    if require_start:
        if self.start is None:
            raise ValueError("require_start=True but this grammar has no start symbol.")
        if parse_tree.root.symbol != self.start:
            return False
    root = parse_tree.root
    if root.rule_applied is None:
        return len(root.children) == 0 and root.symbol not in self.nonterminals
    try:
        d = self.start_derivation(root.symbol)
    except ValueError:
        return False
    stack = [root]
    while stack:
        node = stack.pop()
        if node.rule_applied is None:
            if node.symbol in self.nonterminals:
                return False
            continue
        if d.complete:
            return False
        if node.rule_applied not in d.options():
            return False
        if len(node.children) != len(node.rule_applied.rhs):
            return False
        for child, expected in zip(node.children, node.rule_applied.rhs):
            if child.symbol != expected:
                return False
        d.apply(node.rule_applied)
        for child in reversed(node.children):
            if child.symbol in self.nonterminals:
                stack.append(child)
    return d.complete

start_derivation

start_derivation(start: Optional[str] = None) -> 'Derivation'

Begin a new derivation.

Examples:

>>> g = Grammar(start="E")
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation()
>>> d.complete
False

Parameters:

Name Type Description Default
start Optional[str]

Start non-terminal. Defaults to self.start when None.

None

Returns:

Type Description
'Derivation'

A Derivation at the first expansion step.

Raises:

Type Description
ValueError

If the resolved start symbol is not a non-terminal in this grammar.

Source code in SRToolkit/utils/grammar/grammar.py
def start_derivation(self, start: Optional[str] = None) -> "Derivation":
    """
    Begin a new derivation.

    Examples:
        >>> g = Grammar(start="E")
        >>> g.add_rule(Rule("E", ["x"]))
        >>> d = g.start_derivation()
        >>> d.complete
        False

    Args:
        start: Start non-terminal.  Defaults to
            ``self.start`` when
            ``None``.

    Returns:
        A [Derivation][SRToolkit.utils.grammar.derivation.Derivation] at the first expansion step.

    Raises:
        ValueError: If the resolved start symbol is not a non-terminal in
            this grammar.
    """
    from .derivation import Derivation

    s = self.start if start is None else start

    if s is None:
        raise ValueError(
            "No start symbol. Either add it thorough the constructor or as parameter "
            "in the Grammar.start_derivation method."
        )
    if s not in self.nonterminals:
        raise ValueError(f"Start symbol '{s}' is not a non-terminal in this grammar.")

    return Derivation(self, s)

generate_one

generate_one(start: Optional[str] = None, max_steps: int = 1000, max_retries: int = 10) -> Optional[list[str]]

Generate a single expression by sampling the grammar to completion.

Convenience wrapper around start_derivation and Derivation.generate.

Each attempt runs a fresh derivation from scratch. When a derivation exceeds max_steps rule applications (e.g. because random sampling kept choosing recursive rules), the attempt is discarded and a new one starts. None is returned only when every attempt fails, signalling that the caller should either relax the grammar, add more liberal constraints, or increase max_steps / max_retries.

Examples:

>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar(start="E")
>>> g.add_rule(Rule("E", ["x"]))
>>> g.generate_one()
['x']
>>> g.generate_one(max_steps=0, max_retries=1) is None
True

Parameters:

Name Type Description Default
start Optional[str]

Start non-terminal. Defaults to self.start when None.

None
max_steps int

Maximum rule applications per attempt. A negative number means unlimited (no retry logic applies).

1000
max_retries int

Maximum number of fresh attempts before returning None. Must be at least 1.

10

Returns:

Type Description
Optional[list[str]]

A list of terminal tokens in left-to-right order, or None if

Optional[list[str]]

every attempt exceeded max_steps.

Raises:

Type Description
ValueError

If the resolved start symbol is not a non-terminal in this grammar.

Source code in SRToolkit/utils/grammar/grammar.py
def generate_one(
    self,
    start: Optional[str] = None,
    max_steps: int = 1000,
    max_retries: int = 10,
) -> Optional[list[str]]:
    """
    Generate a single expression by sampling the grammar to completion.

    Convenience wrapper around
    [start_derivation][SRToolkit.utils.grammar.grammar.Grammar.start_derivation]
    and [Derivation.generate][SRToolkit.utils.grammar.derivation.Derivation.generate].

    Each attempt runs a fresh derivation from scratch.  When a derivation
    exceeds ``max_steps`` rule applications (e.g. because random sampling
    kept choosing recursive rules), the attempt is discarded and a new one
    starts.  ``None`` is returned only when every attempt fails, signalling
    that the caller should either relax the grammar, add more liberal
    constraints, or increase ``max_steps`` / ``max_retries``.

    Examples:
        >>> from SRToolkit.utils.grammar import Grammar, Rule
        >>> g = Grammar(start="E")
        >>> g.add_rule(Rule("E", ["x"]))
        >>> g.generate_one()
        ['x']
        >>> g.generate_one(max_steps=0, max_retries=1) is None
        True

    Args:
        start: Start non-terminal. Defaults to ``self.start`` when ``None``.
        max_steps: Maximum rule applications per attempt. A negative number
            means unlimited (no retry logic applies).
        max_retries: Maximum number of fresh attempts before returning
            ``None``. Must be at least ``1``.

    Returns:
        A list of terminal tokens in left-to-right order, or ``None`` if
        every attempt exceeded ``max_steps``.

    Raises:
        ValueError: If the resolved start symbol is not a non-terminal in
            this grammar.
    """
    for _ in range(max(1, max_retries)):
        try:
            return self.start_derivation(start).generate(limit=max_steps)
        except RuntimeError:
            continue
    return None

to_dict

to_dict() -> dict

Serialise this grammar to a JSON-safe dictionary.

Constraints are serialised via their own to_dict method. User-defined constraint subclasses must implement to_dict/from_dict to survive the round-trip; built-in constraints are fully supported.

Returns:

Type Description
dict

Dictionary with keys start, rules, and constraints.

Source code in SRToolkit/utils/grammar/grammar.py
def to_dict(self) -> dict:
    """
    Serialise this grammar to a JSON-safe dictionary.

    Constraints are serialised via their own ``to_dict`` method.  User-defined
    constraint subclasses must implement ``to_dict``/``from_dict`` to survive the
    round-trip; built-in constraints are fully supported.

    Returns:
        Dictionary with keys ``start``, ``rules``, and ``constraints``.
    """
    return {
        "start": self.start,
        "rules": [r.to_dict() for r in self._rules],
        "constraints": [c.to_dict() for c in self._constraints],
    }

from_dict classmethod

from_dict(d: dict) -> 'Grammar'

Reconstruct a Grammar from a dictionary produced by to_dict.

Parameters:

Name Type Description Default
d dict

Dictionary with keys start, rules, and constraints.

required

Returns:

Type Description
'Grammar'

A new Grammar with all rules and

'Grammar'

constraints registered.

Source code in SRToolkit/utils/grammar/grammar.py
@classmethod
def from_dict(cls, d: dict) -> "Grammar":
    """
    Reconstruct a [Grammar][SRToolkit.utils.grammar.Grammar] from a dictionary
    produced by [to_dict][SRToolkit.utils.grammar.Grammar.to_dict].

    Args:
        d: Dictionary with keys ``start``, ``rules``, and ``constraints``.

    Returns:
        A new [Grammar][SRToolkit.utils.grammar.Grammar] with all rules and
        constraints registered.
    """
    from .constraints import Constraint

    rules = [Rule.from_dict(r) for r in d.get("rules", [])]
    g = cls(rules, start=d.get("start"))
    for cd in d.get("constraints", []):
        g.add_constraint(Constraint.from_dict(cd))
    return g

from_symbol_library classmethod

from_symbol_library(symbol_library: Optional[SymbolLibrary] = None, start: Optional[str] = None) -> 'Grammar'

Build a PCFG from a SymbolLibrary using a generic operator-precedence non-terminal hierarchy.

One non-terminal is created per unique OP precedence level present in the library. Known precedences map to readable names; custom precedences fall back to L_{p}:

  • OP_ADDITIVEE
  • OP_MULTIPLICATIVEF
  • OP_POWERB
  • custom precedence pL_{p}

The chain runs from the lowest-precedence level (the grammar start) down to T, K, R, and V. Only levels that have at least one operator are generated; there are no empty pass-through non-terminals.

Heuristic weights are calibrated against empirical expression distributions (e.g. Wikipedia mathematical formulae): E recursion ~30 %, F recursion ~40 %, B recursion ~5 %, custom-level recursion ~40 %.

Examples:

>>> sl = SymbolLibrary.from_symbol_list(["+", "-", "*", "sin", "^2"], 2)
>>> g = Grammar.from_symbol_library(sl)
>>> "E" in g.nonterminals and "V" in g.nonterminals
True
>>> g.is_pcfg()
True

Parameters:

Name Type Description Default
symbol_library Optional[SymbolLibrary]

Token vocabulary. Falls back to the active library from the context manager when None.

None
start Optional[str]

Override the name of the lowest-precedence non-terminal (and g.start). When None (default) the name is auto-inferred from the precedence constants ("E" for standard libraries).

None

Returns:

Type Description
'Grammar'

A Grammar with heuristic PCFG weights.

Raises:

Type Description
ValueError

If the symbol library contains no variables, constants, or literals, making it impossible to generate any terminal expression.

Source code in SRToolkit/utils/grammar/grammar.py
@classmethod
def from_symbol_library(
    cls,
    symbol_library: Optional[SymbolLibrary] = None,
    start: Optional[str] = None,
) -> "Grammar":
    """
    Build a PCFG from a [SymbolLibrary][SRToolkit.utils.symbol_library.SymbolLibrary]
    using a generic operator-precedence non-terminal hierarchy.

    One non-terminal is created per unique ``OP`` precedence level present in
    the library.  Known precedences map to readable names; custom precedences
    fall back to ``L_{p}``:

    - ``OP_ADDITIVE`` → ``E``
    - ``OP_MULTIPLICATIVE`` → ``F``
    - ``OP_POWER`` → ``B``
    - custom precedence *p* → ``L_{p}``

    The chain runs from the lowest-precedence level (the grammar start) down
    to ``T``, ``K``, ``R``, and ``V``.  Only levels that have at least one
    operator are generated; there are no empty pass-through non-terminals.

    Heuristic weights are calibrated against empirical expression distributions
    (e.g. Wikipedia mathematical formulae): E recursion ~30 %, F recursion ~40 %,
    B recursion ~5 %, custom-level recursion ~40 %.

    Examples:
        >>> sl = SymbolLibrary.from_symbol_list(["+", "-", "*", "sin", "^2"], 2)
        >>> g = Grammar.from_symbol_library(sl)
        >>> "E" in g.nonterminals and "V" in g.nonterminals
        True
        >>> g.is_pcfg()
        True

    Args:
        symbol_library: Token vocabulary.  Falls back to the active library
            from the context manager when ``None``.
        start: Override the name of the lowest-precedence non-terminal (and
            ``g.start``).  When ``None`` (default) the name is auto-inferred
            from the precedence constants (``"E"`` for standard libraries).

    Returns:
        A [Grammar][SRToolkit.utils.grammar.Grammar] with heuristic PCFG weights.

    Raises:
        ValueError: If the symbol library contains no variables, constants, or
            literals, making it impossible to generate any terminal expression.
    """
    if symbol_library is None:
        symbol_library = SymbolLibrary.get_active()

    symbols = list(symbol_library.symbols.values())

    # Group operators by precedence level
    op_groups: dict[int, list[str]] = {}
    for s in symbols:
        if s["type"] == OP:
            op_groups.setdefault(s["precedence"], []).append(s["symbol"])
    sorted_precs = sorted(op_groups)

    R_fns = [s["symbol"] for s in symbols if s["type"] == FN and s["precedence"] == FN_PREFIX]
    P_fns = [s["symbol"] for s in symbols if s["type"] == FN and s["precedence"] == FN_POSTFIX]
    variables = [s["symbol"] for s in symbols if s["type"] == VAR]
    consts = [s["symbol"] for s in symbols if s["type"] == CONST]
    lits = [s["symbol"] for s in symbols if s["type"] == LIT]

    if not variables and not consts and not lits:
        raise ValueError(
            "Symbol library has no variables, constants, or literals; "
            "the grammar cannot generate any terminal expression."
        )

    # Collect terminal names so NT names can be made unique against them.
    all_terminals: set[str] = {sym for ops_list in op_groups.values() for sym in ops_list}
    all_terminals.update(R_fns + P_fns + variables + consts + lits)

    # If the caller explicitly chose a start name that is also a terminal symbol,
    # that is an unresolvable conflict — raise immediately.
    if start is not None and start in all_terminals:
        raise ValueError(
            f"start={start!r} is also a terminal symbol in the SymbolLibrary; "
            f"choose a different start symbol (e.g. {start + '_'!r})."
        )

    def _unique_nt(base: str, taken: set[str]) -> str:
        name = base
        while name in taken:
            name += "_"
        return name

    level_nts: dict[int, str] = {}
    for prec in sorted_precs:
        level_nts[prec] = _unique_nt(_level_name(prec), all_terminals)

    t_nt = _unique_nt("T", all_terminals)
    k_nt = _unique_nt("K", all_terminals)
    r_nt = _unique_nt("R", all_terminals)
    v_nt = _unique_nt("V", all_terminals)

    # top_nt is the topmost level of the hierarchy; functions recurse back here.
    top_nt = level_nts[sorted_precs[0]] if sorted_precs else t_nt

    g = cls(start=start if start is not None else top_nt)

    # Build operator-precedence chain.  Level i expands to itself OP level i+1;
    # the highest level chains to T.
    for i, prec in enumerate(sorted_precs):
        nt = level_nts[prec]
        next_nt = level_nts[sorted_precs[i + 1]] if i + 1 < len(sorted_precs) else t_nt
        ops = op_groups[prec]
        weight = _LEVEL_REC_WEIGHTS.get(prec, 0.4)
        infix = _LEVEL_RULE_INFIXES.get(prec, "op")

        for sym in ops:
            # Power is right-associative (a^b^c = a^(b^c)), so recurse on the right.
            if prec == OP_POWER:
                g.add_rule(Rule(nt, [next_nt, sym, nt], weight=weight / len(ops), name=f"{nt}_{infix}_{sym}"))
            else:
                g.add_rule(Rule(nt, [nt, sym, next_nt], weight=weight / len(ops), name=f"{nt}_{infix}_{sym}"))
        g.add_rule(Rule(nt, [next_nt], weight=1.0 - weight, name=f"{nt}_to_{next_nt}"))

    # T level — leaf dispatcher; K and V branches added only when their symbols exist.
    if variables and (consts or lits):
        g.add_rule(Rule(t_nt, [r_nt], weight=0.2, name=f"{t_nt}_to_{r_nt}"))
        g.add_rule(Rule(t_nt, [k_nt], weight=0.2, name=f"{t_nt}_to_{k_nt}"))
        g.add_rule(Rule(t_nt, [v_nt], weight=0.6, name=f"{t_nt}_to_{v_nt}"))
    elif consts or lits:
        g.add_rule(Rule(t_nt, [r_nt], weight=0.3, name=f"{t_nt}_to_{r_nt}"))
        g.add_rule(Rule(t_nt, [k_nt], weight=0.7, name=f"{t_nt}_to_{k_nt}"))
    else:
        g.add_rule(Rule(t_nt, [r_nt], weight=0.3, name=f"{t_nt}_to_{r_nt}"))
        g.add_rule(Rule(t_nt, [v_nt], weight=0.7, name=f"{t_nt}_to_{v_nt}"))

    # K level — constants and literals
    if lits and consts:
        for sym in lits:
            g.add_rule(Rule(k_nt, [sym], weight=0.2 / len(lits), name=f"{k_nt}_{sym}"))
        for sym in consts:
            g.add_rule(Rule(k_nt, [sym], weight=0.8 / len(consts), name=f"{k_nt}_{sym}"))
    elif lits:
        for sym in lits:
            g.add_rule(Rule(k_nt, [sym], weight=1.0 / len(lits), name=f"{k_nt}_{sym}"))
    elif consts:
        for sym in consts:
            g.add_rule(Rule(k_nt, [sym], weight=1.0 / len(consts), name=f"{k_nt}_{sym}"))

    # R level — prefix functions, postfix functions (top_nt sym), and parens.
    # Postfix rules live here directly (no separate P non-terminal).
    if R_fns:
        for sym in R_fns:
            g.add_rule(Rule(r_nt, [sym, "(", top_nt, ")"], weight=0.4 / len(R_fns), name=f"{r_nt}_fn_{sym}"))
        if P_fns:
            for sym in P_fns:
                g.add_rule(Rule(r_nt, [top_nt, sym], weight=0.05 / len(P_fns), name=f"{r_nt}_postfix_{sym}"))
            g.add_rule(Rule(r_nt, ["(", top_nt, ")"], weight=0.55, name=f"{r_nt}_paren"))
        else:
            g.add_rule(Rule(r_nt, ["(", top_nt, ")"], weight=0.6, name=f"{r_nt}_paren"))
    else:
        if P_fns:
            for sym in P_fns:
                g.add_rule(Rule(r_nt, [top_nt, sym], weight=0.05 / len(P_fns), name=f"{r_nt}_postfix_{sym}"))
            g.add_rule(Rule(r_nt, ["(", top_nt, ")"], weight=0.95, name=f"{r_nt}_paren"))
        else:
            g.add_rule(Rule(r_nt, ["(", top_nt, ")"], weight=1.0, name=f"{r_nt}_paren"))

    # V level — variables
    if variables:
        for sym in variables:
            g.add_rule(Rule(v_nt, [sym], weight=1.0 / len(variables), name=f"{v_nt}_{sym}"))

    return g

from_grammar_string classmethod

from_grammar_string(text: str, start: Optional[str] = None) -> 'Grammar'

Construct a Grammar from a string in NLTK production-rule notation.

Each non-empty, non-comment line must have the form::

LHS -> RHS_1 | RHS_2 | ...

where each alternative is a space-separated sequence of symbols. Terminals are enclosed in single quotes (e.g. '+'); unquoted tokens are treated as non-terminals. An optional weight in square brackets at the end of an alternative is parsed as a float (e.g. E '+' F [0.4]); alternatives without a weight default to 1.0.

The start symbol may be embedded as a comment header emitted by to_grammar_string::

# start: E

The explicit start parameter takes precedence over this comment. When start is None, only the first # start: comment encountered is used; subsequent ones are ignored along with all other comment lines.

Rule names are not preserved in NLTK notation and are set to None on all returned Rule objects.

Examples:

>>> g = Grammar.from_grammar_string("E -> E '+' F | F\nF -> 'x'", start="E")
>>> sorted(g.nonterminals)
['E', 'F']
>>> g.rules_for("F")[0].rhs
['x']
>>> g.start
'E'
>>> g2 = Grammar.from_grammar_string(
...     "# start: E\nE -> E '+' F [0.4] | F [0.6]\nF -> 'x' [1.0]"
... )
>>> g2.start
'E'
>>> g2.is_pcfg()
True

Parameters:

Name Type Description Default
text str

Grammar specification in NLTK notation, optionally with a # start: <symbol> header line.

required
start Optional[str]

Start non-terminal stored on the returned grammar. Takes precedence over a # start: comment in text. Required when neither the parameter nor the comment is present.

None

Returns:

Type Description
'Grammar'

A new Grammar populated with

'Grammar'

the parsed rules.

Raises:

Type Description
ValueError

If a content line contains no ->.

ValueError

If a weight token cannot be converted to float.

ValueError

If text contains no parseable rules.

ValueError

If the start symbol cannot be determined from either the parameter or the # start: comment.

Source code in SRToolkit/utils/grammar/grammar.py
@classmethod
def from_grammar_string(cls, text: str, start: Optional[str] = None) -> "Grammar":
    """
    Construct a [Grammar][SRToolkit.utils.grammar.Grammar] from a string in
    NLTK production-rule notation.

    Each non-empty, non-comment line must have the form::

        LHS -> RHS_1 | RHS_2 | ...

    where each alternative is a space-separated sequence of symbols.
    Terminals are enclosed in single quotes (e.g. ``'+'``); unquoted
    tokens are treated as non-terminals.  An optional weight in square
    brackets at the end of an alternative is parsed as a
    ``float`` (e.g. ``E '+' F [0.4]``); alternatives without a weight
    default to ``1.0``.

    The start symbol may be embedded as a comment header emitted by
    [to_grammar_string][SRToolkit.utils.grammar.Grammar.to_grammar_string]::

        # start: E

    The explicit ``start`` parameter takes precedence over this comment.
    When ``start`` is ``None``, only the first ``# start:`` comment encountered
    is used; subsequent ones are ignored along with all other comment lines.

    Rule names are not preserved in NLTK notation and are set to ``None``
    on all returned [Rule][SRToolkit.utils.grammar.Rule] objects.

    Examples:
        >>> g = Grammar.from_grammar_string("E -> E '+' F | F\\nF -> 'x'", start="E")
        >>> sorted(g.nonterminals)
        ['E', 'F']
        >>> g.rules_for("F")[0].rhs
        ['x']
        >>> g.start
        'E'

        >>> g2 = Grammar.from_grammar_string(
        ...     "# start: E\\nE -> E '+' F [0.4] | F [0.6]\\nF -> 'x' [1.0]"
        ... )
        >>> g2.start
        'E'
        >>> g2.is_pcfg()
        True

    Args:
        text: Grammar specification in NLTK notation, optionally with a
            ``# start: <symbol>`` header line.
        start: Start non-terminal stored on the returned grammar.  Takes
            precedence over a ``# start:`` comment in ``text``.  Required
            when neither the parameter nor the comment is present.

    Returns:
        A new [Grammar][SRToolkit.utils.grammar.Grammar] populated with
        the parsed rules.

    Raises:
        ValueError: If a content line contains no ``->``.
        ValueError: If a weight token cannot be converted to ``float``.
        ValueError: If ``text`` contains no parseable rules.
        ValueError: If the start symbol cannot be determined from either
            the parameter or the ``# start:`` comment.
    """
    resolved_start = start
    rules: list[Rule] = []
    for raw_line in text.splitlines():
        line = raw_line.strip()
        if not line:
            continue
        if line.startswith("#"):
            if resolved_start is None and line.startswith("# start:"):
                resolved_start = line[len("# start:") :].strip()
            continue
        rules.extend(Rule.from_line(line))
    if not rules:
        raise ValueError("No rules parsed from grammar string.")
    if resolved_start is None:
        raise ValueError(
            "No start symbol found. Either pass start=<nonterminal> or include a '# start: <symbol>' line."
        )
    return cls(rules, start=resolved_start)

to_grammar_string

to_grammar_string() -> str

Serialise this grammar to a string in NLTK production-rule notation.

All rules sharing the same left-hand side are written on a single line, separated by |. Symbols that are non-terminals (i.e. appear as the lhs of at least one rule) are written unquoted; all other symbols are enclosed in single quotes. When is_pcfg returns True, the probability of each alternative (its weight divided by the sum of weights for that non-terminal) is appended in square brackets.

Rule names and registered constraints are not included in the output. When self.start is set, a # start: <symbol> comment is prepended so that from_grammar_string can reconstruct the grammar without requiring an explicit start argument.

Examples:

>>> g = Grammar([
...     Rule("E", ["E", "+", "F"]),
...     Rule("E", ["F"]),
...     Rule("F", ["x"]),
... ], start="E")
>>> print(g.to_grammar_string())
# start: E
E -> E '+' F | F
F -> 'x'
>>> gp = Grammar([
...     Rule("E", ["x"], weight=1.0),
...     Rule("E", ["y"], weight=3.0),
... ])
>>> print(gp.to_grammar_string())
E -> 'x' [0.25] | 'y' [0.75]

Returns:

Type Description
str

Multi-line string in NLTK production-rule notation, optionally

str

preceded by a # start: header.

Source code in SRToolkit/utils/grammar/grammar.py
def to_grammar_string(self) -> str:
    """
    Serialise this grammar to a string in NLTK production-rule notation.

    All rules sharing the same left-hand side are written on a single
    line, separated by `` | ``.  Symbols that are non-terminals (i.e.
    appear as the ``lhs`` of at least one rule) are written unquoted;
    all other symbols are enclosed in single quotes.  When
    [is_pcfg][SRToolkit.utils.grammar.Grammar.is_pcfg] returns ``True``,
    the probability of each alternative (its weight divided by the sum of
    weights for that non-terminal) is appended in square brackets.

    Rule names and registered constraints are not included in the output.
    When ``self.start`` is set, a ``# start: <symbol>`` comment is
    prepended so that
    [from_grammar_string][SRToolkit.utils.grammar.Grammar.from_grammar_string]
    can reconstruct the grammar without requiring an explicit ``start``
    argument.

    Examples:
        >>> g = Grammar([
        ...     Rule("E", ["E", "+", "F"]),
        ...     Rule("E", ["F"]),
        ...     Rule("F", ["x"]),
        ... ], start="E")
        >>> print(g.to_grammar_string())
        # start: E
        E -> E '+' F | F
        F -> 'x'

        >>> gp = Grammar([
        ...     Rule("E", ["x"], weight=1.0),
        ...     Rule("E", ["y"], weight=3.0),
        ... ])
        >>> print(gp.to_grammar_string())
        E -> 'x' [0.25] | 'y' [0.75]

    Returns:
        Multi-line string in NLTK production-rule notation, optionally
        preceded by a ``# start:`` header.
    """
    nts = self.nonterminals
    pcfg = self.is_pcfg()
    lines: list[str] = []
    if self.start is not None:
        lines.append(f"# start: {self.start}")
    for lhs, rules in self._rules_by_lhs.items():
        parts: list[str] = []
        if pcfg:
            total = sum(r.weight for r in rules)
        for rule in rules:
            rhs_tokens = [sym if sym in nts else f"'{sym}'" for sym in rule.rhs]
            alt = " ".join(rhs_tokens)
            if pcfg:
                alt += f" [{rule.weight / total}]"
            parts.append(alt)
        lines.append(f"{lhs} -> {' | '.join(parts)}")
    return "\n".join(lines)

ParseTree

ParseTree(root: ParseTreeNode)

Full derivation history of an expression under a grammar.

Unlike Node, which holds only terminal symbols for expression evaluation, a ParseTree retains every non-terminal and every production applied during the derivation.

Examples:

>>> r = Rule("E", ["x"])
>>> leaf = ParseTreeNode("x", None)
>>> root = ParseTreeNode("E", r, [leaf])
>>> pt = ParseTree(root)
>>> pt.to_token_list()
['x']
>>> pt.productions_used()
[Rule(lhs='E', rhs=['x'], weight=1.0, name=None)]

Attributes:

Name Type Description
root

The root node of the parse tree.

Source code in SRToolkit/utils/grammar/grammar.py
def __init__(self, root: ParseTreeNode) -> None:
    self.root = root

to_token_list

to_token_list() -> list[str]

Collect all terminal tokens in left-to-right order.

Examples:

>>> leaf1 = ParseTreeNode("a", None)
>>> leaf2 = ParseTreeNode("+", None)
>>> leaf3 = ParseTreeNode("b", None)
>>> root = ParseTreeNode("E", Rule("E", ["a", "+", "b"]), [leaf1, leaf2, leaf3])
>>> ParseTree(root).to_token_list()
['a', '+', 'b']

Returns:

Type Description
list[str]

Flat list of terminal tokens in left-to-right order.

Source code in SRToolkit/utils/grammar/grammar.py
def to_token_list(self) -> list[str]:
    """
    Collect all terminal tokens in left-to-right order.

    Examples:
        >>> leaf1 = ParseTreeNode("a", None)
        >>> leaf2 = ParseTreeNode("+", None)
        >>> leaf3 = ParseTreeNode("b", None)
        >>> root = ParseTreeNode("E", Rule("E", ["a", "+", "b"]), [leaf1, leaf2, leaf3])
        >>> ParseTree(root).to_token_list()
        ['a', '+', 'b']

    Returns:
        Flat list of terminal tokens in left-to-right order.
    """
    tokens: list[str] = []
    self._collect_tokens(self.root, tokens)
    return tokens

productions_used

productions_used() -> list[Rule]

Return all rules applied in the derivation, in pre-order.

Examples:

>>> leaf = ParseTreeNode("x", None)
>>> r = Rule("E", ["x"])
>>> root = ParseTreeNode("E", r, [leaf])
>>> ParseTree(root).productions_used()
[Rule(lhs='E', rhs=['x'], weight=1.0, name=None)]

Returns:

Type Description
list[Rule]

List of Rule objects in pre-order traversal order.

Source code in SRToolkit/utils/grammar/grammar.py
def productions_used(self) -> list[Rule]:
    """
    Return all rules applied in the derivation, in pre-order.

    Examples:
        >>> leaf = ParseTreeNode("x", None)
        >>> r = Rule("E", ["x"])
        >>> root = ParseTreeNode("E", r, [leaf])
        >>> ParseTree(root).productions_used()
        [Rule(lhs='E', rhs=['x'], weight=1.0, name=None)]

    Returns:
        List of [Rule][SRToolkit.utils.grammar.Rule] objects in pre-order traversal order.
    """
    rules: list[Rule] = []
    self._collect_rules(self.root, rules)
    return rules

ParseTreeNode dataclass

ParseTreeNode(symbol: str, rule_applied: Optional[Rule], children: list[ParseTreeNode] = list())

A node in a derivation parse tree.

Terminal leaves have rule_applied=None and children=[]. Internal nodes store the rule that expanded the non-terminal at this position.

Attributes:

Name Type Description
symbol str

The grammar symbol at this node (terminal or non-terminal name).

rule_applied Optional[Rule]

The Rule used to expand this node. None for terminal leaves.

children list[ParseTreeNode]

Ordered child nodes corresponding to the symbols in rule_applied.rhs.

Rule dataclass

Rule(lhs: str, rhs: list[str], weight: float = 1.0, name: Optional[str] = None)

A single production rule in a grammar.

A rule is a CFG production when weights remain 1 for all rules in the grammar; when weights differ across rules, the grammar is treated as a PCFG and productions are sampled proportionally. Weights are unnormalised — the grammar normalises them at sampling time.

Examples:

>>> r = Rule("E", ["E", "+", "F"], weight=0.4, name="E_add_+")
>>> r.lhs
'E'
>>> r.rhs
['E', '+', 'F']
>>> r.weight
0.4
>>> r.name
'E_add_+'
>>> Rule("E", ["F"]).weight
1.0

Attributes:

Name Type Description
lhs str

The non-terminal being expanded, e.g. "E".

rhs list[str]

Ordered sequence of symbols the non-terminal expands to. Each element is either a terminal token (e.g. "+" or "sin") or the name of another non-terminal. A symbol is treated as a non-terminal if and only if it appears as the lhs of at least one rule in the grammar.

weight float

Unnormalised sampling weight. Defaults to 1.0, which produces uniform sampling within a group when all rules share the same weight.

name Optional[str]

Optional stable identifier for this rule. Used by constraints for scoping and identification. None by default.

from_line classmethod

from_line(line: str) -> list['Rule']

Parse one NLTK production line into a list of Rule objects.

The line must have the form::

LHS -> RHS_1 | RHS_2 | ...

where each alternative is a space-separated sequence of symbols. Terminals are enclosed in single quotes; unquoted tokens are non-terminals. An optional weight in square brackets at the end of an alternative is parsed as a float; alternatives without a weight default to 1.0.

Examples:

>>> Rule.from_line("E -> E '+' F [0.4] | F [0.6]")
[Rule(lhs='E', rhs=['E', '+', 'F'], weight=0.4, name=None), Rule(lhs='E', rhs=['F'], weight=0.6, name=None)]
>>> Rule.from_line("F -> 'x'")
[Rule(lhs='F', rhs=['x'], weight=1.0, name=None)]

Parameters:

Name Type Description Default
line str

A single non-empty, non-comment production line.

required

Returns:

Type Description
list['Rule']

List of Rule objects, one per alternative.

Raises:

Type Description
ValueError

If line contains no ->.

ValueError

If the left-hand side is empty.

ValueError

If a weight token cannot be converted to float.

ValueError

If no alternatives are parsed from the right-hand side.

Source code in SRToolkit/utils/grammar/grammar.py
@classmethod
def from_line(cls, line: str) -> list["Rule"]:
    """
    Parse one NLTK production line into a list of [Rule][SRToolkit.utils.grammar.Rule] objects.

    The line must have the form::

        LHS -> RHS_1 | RHS_2 | ...

    where each alternative is a space-separated sequence of symbols.
    Terminals are enclosed in single quotes; unquoted tokens are non-terminals.
    An optional weight in square brackets at the end of an alternative is parsed
    as a ``float``; alternatives without a weight default to ``1.0``.

    Examples:
        >>> Rule.from_line("E -> E '+' F [0.4] | F [0.6]")
        [Rule(lhs='E', rhs=['E', '+', 'F'], weight=0.4, name=None), Rule(lhs='E', rhs=['F'], weight=0.6, name=None)]
        >>> Rule.from_line("F -> 'x'")
        [Rule(lhs='F', rhs=['x'], weight=1.0, name=None)]

    Args:
        line: A single non-empty, non-comment production line.

    Returns:
        List of [Rule][SRToolkit.utils.grammar.Rule] objects, one per alternative.

    Raises:
        ValueError: If ``line`` contains no ``->``.
        ValueError: If the left-hand side is empty.
        ValueError: If a weight token cannot be converted to ``float``.
        ValueError: If no alternatives are parsed from the right-hand side.
    """
    line = line.strip()
    if "->" not in line:
        raise ValueError(f"Expected '->' in grammar line: {line!r}")
    lhs, rhs_part = line.split("->", 1)
    lhs = lhs.strip()
    if not lhs:
        raise ValueError(f"Empty left-hand side in grammar line: {line!r}")
    rules = []
    for alt in rhs_part.split("|"):
        alt = alt.strip()
        if not alt:
            continue
        weight = 1.0
        m = re.search(r"\[([^\]]+)\]\s*$", alt)
        if m:
            try:
                weight = float(m.group(1))
            except ValueError:
                raise ValueError(f"Invalid weight {m.group(1)!r} in grammar line: {line!r}")
            alt = alt[: m.start()].strip()
        tokens = [t[1:-1] if t.startswith("'") and t.endswith("'") else t for t in re.findall(r"'[^']*'|\S+", alt)]
        if tokens:
            rules.append(cls(lhs, tokens, weight=weight))
    if not rules:
        raise ValueError(f"No alternatives parsed from grammar line: {line!r}")
    return rules