Grammar
SRToolkit.utils.grammar
CFG/PCFG grammar representation, constraint protocol, and stateful derivation engine.
Provides
Rule, ParseTreeNode, ParseTree, Grammar, Constraint, AncestorInfo, ExpansionContext, MaxDepth, MaxNodes, MaxOccurrences, NoNested, DimensionalConsistency, Derivation.
AncestorInfo
dataclass
One entry in an ExpansionContext's ancestor chain.
Attributes:
| Name | Type | Description |
|---|---|---|
nonterminal |
str
|
The non-terminal symbol at this ancestor position. |
rule |
Rule
|
The rule applied at this position. |
child_index |
int
|
The index within |
Constraint
Bases: Generic[L, G]
Base class for derivation constraints (hard filters).
Subclass and override the methods you need. Constraints carry only construction-time configuration; all per-derivation state is managed by the engine and threaded through method arguments.
Scoping — set nonterminals and/or rule_names to restrict
allows to a subset of slots
or rules. A scope miss is treated as implicit acceptance.
update is always called
regardless of scope so that global counters stay accurate.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([Rule("E", ["x"]), Rule("E", ["y"])])
>>> class RejectY(Constraint):
... def allows(self, slot, rule, global_): return rule.rhs != ["y"]
>>> g.add_constraint(RejectY())
>>> d = g.start_derivation("E")
>>> [r.rhs for r in d.options()]
[['x']]
to_dict
Serialise this constraint to a JSON-safe dictionary.
The base implementation stores only the fully-qualified class path under
constraint_class. Subclasses should call super().to_dict() and
add their own constructor arguments so that from_dict can reconstruct
the instance faithfully. See
from_dict
for an example.
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with at least the key |
Source code in SRToolkit/utils/grammar/constraints.py
from_dict
classmethod
Reconstruct a constraint from a dictionary produced by to_dict.
When called on the base Constraint
class, dispatches to the correct subclass via the constraint_class key using
importlib — both built-in and user-defined subclasses are supported.
When called on a concrete subclass, the subclass must override this method.
The dictionary must contain at minimum the key constraint_class, whose value
is the fully-qualified class path (e.g. "mymodule.MyConstraint"). Any
additional keys are forwarded to the subclass override.
To make a custom subclass serialisable, override both to_dict and
from_dict::
class MyConstraint(Constraint):
def __init__(self, threshold: float) -> None:
self.threshold = threshold
def to_dict(self) -> dict:
return {**super().to_dict(), "threshold": self.threshold}
@classmethod
def from_dict(cls, d: dict) -> "MyConstraint":
return cls(d["threshold"])
Examples:
>>> from SRToolkit.utils.grammar import Constraint, MaxDepth
>>> c = Constraint.from_dict(MaxDepth(5).to_dict())
>>> isinstance(c, MaxDepth) and c.limit == 5
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
dict
|
Dictionary previously returned by |
required |
Returns:
| Type | Description |
|---|---|
'Constraint'
|
A reconstructed Constraint instance. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
ImportError
|
If the module cannot be imported (dispatch path). |
AttributeError
|
If the class cannot be found in the module (dispatch path). |
NotImplementedError
|
If called on a subclass that has not overridden this method. |
Source code in SRToolkit/utils/grammar/constraints.py
initial_local
initial_global
allows
Return True if rule may be applied at slot.
Called only when the slot's non-terminal and rule name are within scope.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slot
|
ExpansionContext[L]
|
Current derivation position with this constraint's local state. |
required |
rule
|
Rule
|
Candidate production rule. |
required |
global_
|
G
|
Current per-derivation global state. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
Source code in SRToolkit/utils/grammar/constraints.py
update
Called after rule is applied at slot.
Returns per-child local states (one per non-terminal in rule.rhs,
in order) and the new global state. May be used to update global
counters by returning a new global_ value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slot
|
ExpansionContext[L]
|
Derivation position immediately before the rule was applied. |
required |
rule
|
Rule
|
The rule that was applied. |
required |
global_
|
G
|
Global state before this application. |
required |
Returns:
| Type | Description |
|---|---|
list[L]
|
|
G
|
the number of non-terminals in |
Source code in SRToolkit/utils/grammar/constraints.py
DimensionalConsistency
DimensionalConsistency(variable_units: dict[str, dict], target_unit: dict, constant_units: Optional[dict[str, dict]] = None, symbol_library: Optional[SymbolLibrary] = None, allow_unit_polymorphic_constants: bool = False)
Bases: Constraint[Optional[Unit], None]
Stateful constraint that enforces dimensional analysis during expression generation.
Local state at each slot is the required unit (Optional[Unit]).
None means "any unit is acceptable here" — used for the children of
multiplicative operators, whose individual units are underdetermined.
Unit representation: units are dict[str, Fraction] mapping base
dimension names (e.g. "m", "s", "kg") to rational exponents.
Dimensionless is {} (empty dict).
Free constants (const type): treated as dimensionless by default.
Set allow_unit_polymorphic_constants=True to let them absorb any
required unit.
Named physical constants (lit type, e.g. "g", "c"):
declared via constant_units; checked like variables.
Undeclared variables or literals: conservatively accepted.
Rule classification: uses rule.name when available (see
Grammar.from_symbol_library
for the naming scheme). Falls back to rhs-shape heuristics for
unnamed rules.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> from SRToolkit.utils.grammar import DimensionalConsistency
>>> from fractions import Fraction
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["E", "+", "F"], weight=0.6, name="E_add_+"))
>>> g.add_rule(Rule("E", ["F"], weight=0.4, name="E_to_F"))
>>> g.add_rule(Rule("F", ["v"], weight=0.5, name="F_v"))
>>> g.add_rule(Rule("F", ["t"], weight=0.5, name="F_t"))
>>> dc = DimensionalConsistency(
... variable_units={"v": {"m": 1, "s": -1}, "t": {"s": 1}},
... target_unit={"m": 1, "s": -1},
... )
>>> g.add_constraint(dc)
>>> d = g.start_derivation("E")
>>> all(r.rhs != ["t"] for r in d.options())
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variable_units
|
dict[str, dict]
|
Mapping from variable token to its unit. |
required |
target_unit
|
dict
|
Required unit of the generated expression. |
required |
constant_units
|
Optional[dict[str, dict]]
|
Units for named physical constants ( |
None
|
symbol_library
|
Optional[SymbolLibrary]
|
Used to classify tokens by type and precedence. |
None
|
allow_unit_polymorphic_constants
|
bool
|
If |
False
|
Source code in SRToolkit/utils/grammar/constraints.py
ExpansionContext
dataclass
ExpansionContext(nonterminal: str, local: L, steps: int, parent_rule: Optional[Rule], child_index: Optional[int], ancestors: tuple[AncestorInfo, ...], partial_tree: ParseTreeNode, nonterminals: frozenset[str], frontier_size: int = 1)
Bases: Generic[L]
Read-only view of the current derivation position passed to constraints.
The engine constructs an ExpansionContext on demand
for each (constraint, candidate_rule) pair evaluated in
Derivation.options and for
each (constraint, selected_rule) pair in
Derivation.apply.
Attributes:
| Name | Type | Description |
|---|---|---|
nonterminal |
str
|
The non-terminal symbol being expanded at this position. |
local |
L
|
This constraint's per-slot inherited state. |
steps |
int
|
Number of rule applications made so far in the derivation
(i.e. how many times |
parent_rule |
Optional[Rule]
|
The rule whose application created this slot ( |
child_index |
Optional[int]
|
Index of this slot in |
ancestors |
tuple[AncestorInfo, ...]
|
Ancestor chain from the root to this slot's immediate parent, in root-first order. |
partial_tree |
ParseTreeNode
|
Root of the partially-built parse tree. Read-only by contract — constraints must not mutate it. |
nonterminals |
frozenset[str]
|
Frozen set of all non-terminal symbols in the grammar. |
frontier_size |
int
|
Number of open (unexpanded) non-terminal slots in the current derivation frontier, including this slot. Useful for budget constraints such as MaxNodes that need to account for the minimum number of rule applications still required to complete the derivation. |
MaxDepth
Bases: Constraint[int, None]
Hard limit on derivation depth.
Local state is the remaining depth budget at each slot. A rule is rejected when its application would require at least one recursive non-terminal child but the budget has reached zero.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
... Rule("E", ["E", "+", "E"]),
... Rule("E", ["x"]),
... ])
>>> g.add_constraint(MaxDepth(0))
>>> d = g.start_derivation("E")
>>> all(r.rhs == ["x"] for r in d.options())
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum nesting depth (number of non-terminal expansion levels). |
required |
Source code in SRToolkit/utils/grammar/constraints.py
MaxNodes
Bases: Constraint[None, int]
Hard limit on the total number of rule applications in a derivation.
Global state is a running count of applications so far. A rule is rejected when applying it would push the count over the limit taking into account the non-terminal children it introduces (each of which will require at least one more application).
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
... Rule("E", ["E", "+", "E"]),
... Rule("E", ["x"]),
... ])
>>> g.add_constraint(MaxNodes(3))
>>> d = g.start_derivation("E")
>>> # At node-count 3 only the terminal rule survives
>>> tokens = d.generate()
>>> len(tokens) <= 5
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of rule applications. |
required |
Source code in SRToolkit/utils/grammar/constraints.py
MaxOccurrences
Bases: Constraint[None, int]
Hard limit on how many times a specific terminal symbol may appear.
Global state counts occurrences committed so far. A rule whose rhs
contains the tracked symbol is rejected once the count equals the limit.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
... Rule("E", ["E", "+", "E"]),
... Rule("E", ["x"]),
... Rule("E", ["y"]),
... ])
>>> g.add_constraint(MaxOccurrences("x", 1))
>>> d = g.start_derivation("E")
>>> tokens = d.generate(limit=200)
>>> tokens.count("x") <= 1
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
Terminal token to track. |
required |
limit
|
int
|
Maximum allowed occurrences. |
required |
Source code in SRToolkit/utils/grammar/constraints.py
NoNested
Bases: Constraint[bool, None]
Prevent any symbol in a group from appearing nested inside any other symbol in the same group.
Local state is a boolean "currently under any symbol in the group". Any
rule whose rhs contains a group symbol is rejected while the local
state is True. Children inherit True once such a rule is applied.
Pass a single string to forbid self-nesting only; pass multiple symbols to
forbid cross-nesting within the group (e.g. NoNested(TRIG_FNS) prevents
sin(cos(x)) as well as sin(sin(x))).
Warning
This constraint propagates the "inside" flag to all non-terminal
children of any rule whose rhs contains a group symbol. It works
correctly when each group symbol appears in a rule with exactly one
non-terminal child (typical prefix-function rules such as
E -> sin ( E )) and for infix operators where every child should be
blocked (e.g. E -> E + E). It does not work correctly for
mixed rules that combine a function call with additional non-terminal
siblings in a single production (e.g. E -> sin ( E ) + F): the
sibling F will be incorrectly treated as inside the group. If your
grammar uses such rules, implement a custom
Constraint instead.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar([
... Rule("E", ["sin", "(", "E", ")"]),
... Rule("E", ["cos", "(", "E", ")"]),
... Rule("E", ["x"]),
... ])
>>> g.add_constraint(NoNested(["sin", "cos"]))
>>> d = g.start_derivation("E")
>>> d.apply(g.rules_for("E")[0]) # apply sin(E)
>>> # Now inside sin; both sin and cos rules should be gone
>>> opts = d.options()
>>> all("sin" not in r.rhs and "cos" not in r.rhs for r in opts)
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbols
|
str | Iterable[str]
|
A single terminal token or an iterable of terminal tokens that form the nesting group. |
required |
Source code in SRToolkit/utils/grammar/constraints.py
Derivation
Stateful leftmost derivation over a Grammar.
A derivation begins at a start non-terminal and proceeds by repeatedly choosing a rule to apply to the leftmost unexpanded non-terminal. options returns candidate rules filtered by all registered constraints; apply advances the derivation by one production.
Per-slot local state and per-derivation global state for each registered constraint are maintained internally; constraint instances carry only construction-time configuration and are safe to share across parallel derivations.
Obtain a Derivation via Grammar.start_derivation.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.complete
False
>>> opts = d.options()
>>> len(opts)
1
>>> d.apply(opts[0])
>>> d.complete
True
>>> d.to_token_list()
['x']
Source code in SRToolkit/utils/grammar/derivation.py
complete
property
local_stack
Return the local state stack for constraint across the open frontier,
leftmost slot first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
constraint
|
Constraint
|
A constraint previously registered on the grammar. |
required |
Returns:
| Type | Description |
|---|---|
list[Any]
|
One entry per open non-terminal in left-to-right order. |
Source code in SRToolkit/utils/grammar/derivation.py
global_state
Return the per-derivation global state for constraint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
constraint
|
Constraint
|
A constraint previously registered on the grammar. |
required |
Returns:
| Type | Description |
|---|---|
Any
|
The current global value owned by this derivation for |
Source code in SRToolkit/utils/grammar/derivation.py
options
Return candidate rules for the current leftmost unexpanded non-terminal, filtered by every registered constraint's allows.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> g.add_rule(Rule("E", ["y"]))
>>> d = g.start_derivation("E")
>>> len(d.options())
2
Returns:
| Type | Description |
|---|---|
list[Rule]
|
List of Rule objects that every |
list[Rule]
|
constraint accepts. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the derivation is already complete. |
Source code in SRToolkit/utils/grammar/derivation.py
apply
Apply rule to the current leftmost unexpanded non-terminal.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["E", "+", "F"]))
>>> g.add_rule(Rule("E", ["x"]))
>>> g.add_rule(Rule("F", ["y"]))
>>> d = g.start_derivation("E")
>>> d.apply(g.rules_for("E")[0]) # E -> E + F
>>> d.apply(g.rules_for("E")[1]) # E -> x
>>> d.apply(g.rules_for("F")[0]) # F -> y
>>> d.to_token_list()
['x', '+', 'y']
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule
|
Rule
|
A rule whose |
required |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the derivation is already complete. |
ValueError
|
If |
Source code in SRToolkit/utils/grammar/derivation.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 | |
sample
Apply one rule chosen proportionally by weight (PCFG) or uniformly (CFG) from the surviving candidates.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.sample()
>>> d.complete
True
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the derivation is already complete. |
RuntimeError
|
If all candidate rules are filtered out by allows. |
Source code in SRToolkit/utils/grammar/derivation.py
generate
Run the derivation to completion and return the token list.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> g.start_derivation("E").generate()
['x']
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of rule applications. A negative value means
unlimited. Default |
1000
|
Returns:
| Type | Description |
|---|---|
list[str]
|
Flat list of terminal tokens in left-to-right order. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the derivation does not complete within
|
Source code in SRToolkit/utils/grammar/derivation.py
to_token_list
Return the completed expression as a flat token list.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.apply(d.options()[0])
>>> d.to_token_list()
['x']
Returns:
| Type | Description |
|---|---|
list[str]
|
Flat list of terminal tokens in left-to-right order. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the derivation is not yet complete. |
Source code in SRToolkit/utils/grammar/derivation.py
to_parse_tree
Return the completed derivation as a ParseTree.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation("E")
>>> d.apply(d.options()[0])
>>> isinstance(d.to_parse_tree(), ParseTree)
True
Returns:
| Type | Description |
|---|---|
ParseTree
|
The ParseTree rooted at the |
ParseTree
|
start symbol. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the derivation is not yet complete. |
Source code in SRToolkit/utils/grammar/derivation.py
Grammar
A context-free grammar (CFG) or probabilistic context-free grammar (PCFG).
Rules are added via add_rule. The set of
non-terminals is derived automatically: a symbol is a non-terminal if and only
if it appears as the lhs of at least one rule.
A grammar is a CFG when every rule carries the default weight (1.0), making
sampling uniform within each group. It is a PCFG when any rule has a weight
that differs from 1.0.
Constraints are registered via add_constraint. During a derivation, options returns only rules that every constraint's allows accepts.
Examples:
>>> g = Grammar([
... Rule("E", ["E", "+", "F"], weight=0.4),
... Rule("E", ["F"], weight=0.6),
... Rule("F", ["x"]),
... ])
>>> "E" in g.nonterminals
True
>>> g.is_pcfg()
True
>>> len(g.rules_for("E"))
2
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
Optional[list[Rule]]
|
None
|
|
start
|
Optional[str]
|
Default start non-terminal used by
start_derivation
when no |
None
|
Source code in SRToolkit/utils/grammar/grammar.py
nonterminals
property
add_rule
Add a production rule to the grammar.
Examples:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule
|
Rule
|
The Rule to register. |
required |
Source code in SRToolkit/utils/grammar/grammar.py
add_constraint
Register a constraint applied at each derivation step.
A rule is offered as an option only when every registered constraint's
allows returns True
for it.
Examples:
>>> from SRToolkit.utils.grammar import MaxDepth
>>> g = Grammar([Rule("E", ["E", "+", "E"]), Rule("E", ["x"])])
>>> g.add_constraint(MaxDepth(0))
>>> d = g.start_derivation("E")
>>> [r.rhs for r in d.options()]
[['x']]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
constraint
|
'Constraint'
|
A Constraint instance — typically a built-in such as MaxDepth, MaxNodes, MaxOccurrences, NoNested, or DimensionalConsistency, or a user-defined subclass. |
required |
Source code in SRToolkit/utils/grammar/grammar.py
rules_for
Return all rules whose lhs matches nonterminal.
Examples:
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["F"]))
>>> g.add_rule(Rule("E", ["x"]))
>>> len(g.rules_for("E"))
2
>>> g.rules_for("Z")
[]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nonterminal
|
str
|
Non-terminal to look up. |
required |
Returns:
| Type | Description |
|---|---|
list[Rule]
|
List of matching Rule objects in insertion order. |
Source code in SRToolkit/utils/grammar/grammar.py
is_pcfg
Return True if any rule deviates from the default weight of 1.0.
Examples:
>>> g = Grammar()
>>> g.add_rule(Rule("E", ["x"]))
>>> g.add_rule(Rule("E", ["y"]))
>>> g.is_pcfg()
False
>>> g.add_rule(Rule("F", ["z"], weight=0.3))
>>> g.is_pcfg()
True
Returns:
| Type | Description |
|---|---|
bool
|
|
Source code in SRToolkit/utils/grammar/grammar.py
validate
Validate that parse_tree is structurally valid, uses only productions
from this grammar, and satisfies every registered constraint.
By default the root of the tree may be any non-terminal in the grammar,
not just self.start. This lets you validate sub-trees (e.g. an
F-rooted fragment) in addition to full expressions. Pass
require_start=True to additionally enforce that the root symbol equals
self.start (useful when validating complete, top-level expressions).
The check replays the parse tree through a fresh
Derivation rooted at
parse_tree.root.symbol, walking nodes in leftmost order (mirroring the
derivation frontier). At each internal node it checks that:
- The applied rule is permitted at the current frontier position — meaning it exists in the grammar and every registered constraint accepts it.
- The number of children equals
len(rule.rhs). - Each
children[i].symbolequalsrule.rhs[i].
Terminal leaves are accepted when their symbol is not a non-terminal in this grammar. A non-terminal symbol appearing as a leaf (unexpanded) is rejected.
Examples:
>>> g = Grammar(start="E")
>>> r = Rule("E", ["x"])
>>> g.add_rule(r)
>>> leaf = ParseTreeNode("x", None)
>>> root = ParseTreeNode("E", r, [leaf])
>>> g.validate(ParseTree(root))
True
>>> foreign = Rule("E", ["y"])
>>> root2 = ParseTreeNode("E", foreign, [ParseTreeNode("y", None)])
>>> g.validate(ParseTree(root2))
False
>>> g.add_rule(Rule("F", ["x"]))
>>> r_f = Rule("F", ["x"])
>>> g.add_rule(r_f)
>>> f_root = ParseTreeNode("F", r_f, [ParseTreeNode("x", None)])
>>> g.validate(ParseTree(f_root))
True
>>> g.validate(ParseTree(f_root), require_start=True)
False
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parse_tree
|
ParseTree
|
The ParseTree to validate. |
required |
require_start
|
bool
|
When |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
in this grammar, and all constraints would have permitted the derivation |
bool
|
(and the root matches |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in SRToolkit/utils/grammar/grammar.py
403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 | |
start_derivation
Begin a new derivation.
Examples:
>>> g = Grammar(start="E")
>>> g.add_rule(Rule("E", ["x"]))
>>> d = g.start_derivation()
>>> d.complete
False
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
Optional[str]
|
Start non-terminal. Defaults to
|
None
|
Returns:
| Type | Description |
|---|---|
'Derivation'
|
A Derivation at the first expansion step. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the resolved start symbol is not a non-terminal in this grammar. |
Source code in SRToolkit/utils/grammar/grammar.py
generate_one
generate_one(start: Optional[str] = None, max_steps: int = 1000, max_retries: int = 10) -> Optional[list[str]]
Generate a single expression by sampling the grammar to completion.
Convenience wrapper around start_derivation and Derivation.generate.
Each attempt runs a fresh derivation from scratch. When a derivation
exceeds max_steps rule applications (e.g. because random sampling
kept choosing recursive rules), the attempt is discarded and a new one
starts. None is returned only when every attempt fails, signalling
that the caller should either relax the grammar, add more liberal
constraints, or increase max_steps / max_retries.
Examples:
>>> from SRToolkit.utils.grammar import Grammar, Rule
>>> g = Grammar(start="E")
>>> g.add_rule(Rule("E", ["x"]))
>>> g.generate_one()
['x']
>>> g.generate_one(max_steps=0, max_retries=1) is None
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
Optional[str]
|
Start non-terminal. Defaults to |
None
|
max_steps
|
int
|
Maximum rule applications per attempt. A negative number means unlimited (no retry logic applies). |
1000
|
max_retries
|
int
|
Maximum number of fresh attempts before returning
|
10
|
Returns:
| Type | Description |
|---|---|
Optional[list[str]]
|
A list of terminal tokens in left-to-right order, or |
Optional[list[str]]
|
every attempt exceeded |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the resolved start symbol is not a non-terminal in this grammar. |
Source code in SRToolkit/utils/grammar/grammar.py
to_dict
Serialise this grammar to a JSON-safe dictionary.
Constraints are serialised via their own to_dict method. User-defined
constraint subclasses must implement to_dict/from_dict to survive the
round-trip; built-in constraints are fully supported.
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with keys |
Source code in SRToolkit/utils/grammar/grammar.py
from_dict
classmethod
Reconstruct a Grammar from a dictionary produced by to_dict.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
dict
|
Dictionary with keys |
required |
Returns:
| Type | Description |
|---|---|
'Grammar'
|
A new Grammar with all rules and |
'Grammar'
|
constraints registered. |
Source code in SRToolkit/utils/grammar/grammar.py
from_symbol_library
classmethod
from_symbol_library(symbol_library: Optional[SymbolLibrary] = None, start: Optional[str] = None) -> 'Grammar'
Build a PCFG from a SymbolLibrary using a generic operator-precedence non-terminal hierarchy.
One non-terminal is created per unique OP precedence level present in
the library. Known precedences map to readable names; custom precedences
fall back to L_{p}:
OP_ADDITIVE→EOP_MULTIPLICATIVE→FOP_POWER→B- custom precedence p →
L_{p}
The chain runs from the lowest-precedence level (the grammar start) down
to T, K, R, and V. Only levels that have at least one
operator are generated; there are no empty pass-through non-terminals.
Heuristic weights are calibrated against empirical expression distributions (e.g. Wikipedia mathematical formulae): E recursion ~30 %, F recursion ~40 %, B recursion ~5 %, custom-level recursion ~40 %.
Examples:
>>> sl = SymbolLibrary.from_symbol_list(["+", "-", "*", "sin", "^2"], 2)
>>> g = Grammar.from_symbol_library(sl)
>>> "E" in g.nonterminals and "V" in g.nonterminals
True
>>> g.is_pcfg()
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol_library
|
Optional[SymbolLibrary]
|
Token vocabulary. Falls back to the active library
from the context manager when |
None
|
start
|
Optional[str]
|
Override the name of the lowest-precedence non-terminal (and
|
None
|
Returns:
| Type | Description |
|---|---|
'Grammar'
|
A Grammar with heuristic PCFG weights. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the symbol library contains no variables, constants, or literals, making it impossible to generate any terminal expression. |
Source code in SRToolkit/utils/grammar/grammar.py
623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 | |
from_grammar_string
classmethod
Construct a Grammar from a string in NLTK production-rule notation.
Each non-empty, non-comment line must have the form::
LHS -> RHS_1 | RHS_2 | ...
where each alternative is a space-separated sequence of symbols.
Terminals are enclosed in single quotes (e.g. '+'); unquoted
tokens are treated as non-terminals. An optional weight in square
brackets at the end of an alternative is parsed as a
float (e.g. E '+' F [0.4]); alternatives without a weight
default to 1.0.
The start symbol may be embedded as a comment header emitted by to_grammar_string::
# start: E
The explicit start parameter takes precedence over this comment.
When start is None, only the first # start: comment encountered
is used; subsequent ones are ignored along with all other comment lines.
Rule names are not preserved in NLTK notation and are set to None
on all returned Rule objects.
Examples:
>>> g = Grammar.from_grammar_string("E -> E '+' F | F\nF -> 'x'", start="E")
>>> sorted(g.nonterminals)
['E', 'F']
>>> g.rules_for("F")[0].rhs
['x']
>>> g.start
'E'
>>> g2 = Grammar.from_grammar_string(
... "# start: E\nE -> E '+' F [0.4] | F [0.6]\nF -> 'x' [1.0]"
... )
>>> g2.start
'E'
>>> g2.is_pcfg()
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Grammar specification in NLTK notation, optionally with a
|
required |
start
|
Optional[str]
|
Start non-terminal stored on the returned grammar. Takes
precedence over a |
None
|
Returns:
| Type | Description |
|---|---|
'Grammar'
|
A new Grammar populated with |
'Grammar'
|
the parsed rules. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a content line contains no |
ValueError
|
If a weight token cannot be converted to |
ValueError
|
If |
ValueError
|
If the start symbol cannot be determined from either
the parameter or the |
Source code in SRToolkit/utils/grammar/grammar.py
796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 | |
to_grammar_string
Serialise this grammar to a string in NLTK production-rule notation.
All rules sharing the same left-hand side are written on a single
line, separated by |. Symbols that are non-terminals (i.e.
appear as the lhs of at least one rule) are written unquoted;
all other symbols are enclosed in single quotes. When
is_pcfg returns True,
the probability of each alternative (its weight divided by the sum of
weights for that non-terminal) is appended in square brackets.
Rule names and registered constraints are not included in the output.
When self.start is set, a # start: <symbol> comment is
prepended so that
from_grammar_string
can reconstruct the grammar without requiring an explicit start
argument.
Examples:
>>> g = Grammar([
... Rule("E", ["E", "+", "F"]),
... Rule("E", ["F"]),
... Rule("F", ["x"]),
... ], start="E")
>>> print(g.to_grammar_string())
# start: E
E -> E '+' F | F
F -> 'x'
>>> gp = Grammar([
... Rule("E", ["x"], weight=1.0),
... Rule("E", ["y"], weight=3.0),
... ])
>>> print(gp.to_grammar_string())
E -> 'x' [0.25] | 'y' [0.75]
Returns:
| Type | Description |
|---|---|
str
|
Multi-line string in NLTK production-rule notation, optionally |
str
|
preceded by a |
Source code in SRToolkit/utils/grammar/grammar.py
ParseTree
Full derivation history of an expression under a grammar.
Unlike Node, which holds only terminal symbols for expression evaluation, a ParseTree retains every non-terminal and every production applied during the derivation.
Examples:
>>> r = Rule("E", ["x"])
>>> leaf = ParseTreeNode("x", None)
>>> root = ParseTreeNode("E", r, [leaf])
>>> pt = ParseTree(root)
>>> pt.to_token_list()
['x']
>>> pt.productions_used()
[Rule(lhs='E', rhs=['x'], weight=1.0, name=None)]
Attributes:
| Name | Type | Description |
|---|---|---|
root |
The root node of the parse tree. |
Source code in SRToolkit/utils/grammar/grammar.py
to_token_list
Collect all terminal tokens in left-to-right order.
Examples:
>>> leaf1 = ParseTreeNode("a", None)
>>> leaf2 = ParseTreeNode("+", None)
>>> leaf3 = ParseTreeNode("b", None)
>>> root = ParseTreeNode("E", Rule("E", ["a", "+", "b"]), [leaf1, leaf2, leaf3])
>>> ParseTree(root).to_token_list()
['a', '+', 'b']
Returns:
| Type | Description |
|---|---|
list[str]
|
Flat list of terminal tokens in left-to-right order. |
Source code in SRToolkit/utils/grammar/grammar.py
productions_used
Return all rules applied in the derivation, in pre-order.
Examples:
>>> leaf = ParseTreeNode("x", None)
>>> r = Rule("E", ["x"])
>>> root = ParseTreeNode("E", r, [leaf])
>>> ParseTree(root).productions_used()
[Rule(lhs='E', rhs=['x'], weight=1.0, name=None)]
Returns:
| Type | Description |
|---|---|
list[Rule]
|
List of Rule objects in pre-order traversal order. |
Source code in SRToolkit/utils/grammar/grammar.py
ParseTreeNode
dataclass
A node in a derivation parse tree.
Terminal leaves have rule_applied=None and children=[].
Internal nodes store the rule that expanded the non-terminal at this position.
Attributes:
| Name | Type | Description |
|---|---|---|
symbol |
str
|
The grammar symbol at this node (terminal or non-terminal name). |
rule_applied |
Optional[Rule]
|
The Rule used to expand this node.
|
children |
list[ParseTreeNode]
|
Ordered child nodes corresponding to the symbols in
|
Rule
dataclass
A single production rule in a grammar.
A rule is a CFG production when weights remain 1 for all rules in the grammar; when weights differ across rules, the grammar is treated as a PCFG and productions are sampled proportionally. Weights are unnormalised — the grammar normalises them at sampling time.
Examples:
>>> r = Rule("E", ["E", "+", "F"], weight=0.4, name="E_add_+")
>>> r.lhs
'E'
>>> r.rhs
['E', '+', 'F']
>>> r.weight
0.4
>>> r.name
'E_add_+'
>>> Rule("E", ["F"]).weight
1.0
Attributes:
| Name | Type | Description |
|---|---|---|
lhs |
str
|
The non-terminal being expanded, e.g. |
rhs |
list[str]
|
Ordered sequence of symbols the non-terminal expands to. Each
element is either a terminal token (e.g. |
weight |
float
|
Unnormalised sampling weight. Defaults to |
name |
Optional[str]
|
Optional stable identifier for this rule. Used by constraints
for scoping and identification. |
from_line
classmethod
Parse one NLTK production line into a list of Rule objects.
The line must have the form::
LHS -> RHS_1 | RHS_2 | ...
where each alternative is a space-separated sequence of symbols.
Terminals are enclosed in single quotes; unquoted tokens are non-terminals.
An optional weight in square brackets at the end of an alternative is parsed
as a float; alternatives without a weight default to 1.0.
Examples:
>>> Rule.from_line("E -> E '+' F [0.4] | F [0.6]")
[Rule(lhs='E', rhs=['E', '+', 'F'], weight=0.4, name=None), Rule(lhs='E', rhs=['F'], weight=0.6, name=None)]
>>> Rule.from_line("F -> 'x'")
[Rule(lhs='F', rhs=['x'], weight=1.0, name=None)]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
line
|
str
|
A single non-empty, non-comment production line. |
required |
Returns:
| Type | Description |
|---|---|
list['Rule']
|
List of Rule objects, one per alternative. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If the left-hand side is empty. |
ValueError
|
If a weight token cannot be converted to |
ValueError
|
If no alternatives are parsed from the right-hand side. |