Expression Generator
SRToolkit.utils.expression_generator
PCFG construction from a SymbolLibrary and Monte-Carlo sampling of symbolic expressions.
create_generic_pcfg
Construct a generic Probabilistic Context-Free Grammar (PCFG) from a symbol library.
The grammar encodes standard mathematical operator precedence through a fixed non-terminal hierarchy:
E— additive level (precedence 0 operators)F— multiplicative level (precedence 1 operators)B— power level (precedence 2 operators)T— terminal: function application (R), constant (C), or variable (V)R— unary functions (precedence 5) and parenthesised sub-expressionsP— postfix functions (precedence -1, e.g.^2)
The returned string is in NLTK PCFG format and can be passed directly to generate_from_pcfg or generate_n_expressions.
Examples:
>>> sl = SymbolLibrary.from_symbol_list(["+", "-", "*", "sin", "^2", "pi"], 2)
>>> print(create_generic_pcfg(sl))
E -> E '+' F [0.2]
E -> E '-' F [0.2]
E -> F [0.6]
F -> F '*' B [0.4]
F -> B [0.6]
B -> T [1.0]
T -> R [0.2]
T -> C [0.2]
T -> V [0.6]
C -> 'pi' [1.0]
R -> 'sin' '(' E ')' [0.4]
R -> P [0.15]
R -> '(' E ')' [0.45]
P -> '(' E ')' '^2' [1.0]
V -> 'X_0' [0.5]
V -> 'X_1' [0.5]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol_library
|
SymbolLibrary
|
Symbol library defining the available tokens, their types, and precedences. |
required |
Returns:
| Type | Description |
|---|---|
str
|
NLTK-formatted PCFG string with generic probabilities. |
Source code in SRToolkit/utils/expression_generator.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
generate_from_pcfg
generate_from_pcfg(grammar_str: str, start_symbol: str = 'E', max_depth: int = 40, limit: int = 100) -> List[str]
Sample a single expression from a PCFG by Monte-Carlo tree expansion.
Examples:
>>> generate_from_pcfg("E -> '1' [1.0]")
['1']
>>> grammar = create_generic_pcfg(SymbolLibrary.default_symbols())
>>> len(generate_from_pcfg(grammar)) > 0
True
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grammar_str
|
str
|
Grammar in NLTK PCFG notation. |
required |
start_symbol
|
str
|
Non-terminal from which expansion begins. Default |
'E'
|
max_depth
|
int
|
Maximum parse-tree depth. Values below |
40
|
limit
|
int
|
Maximum number of sampling attempts before raising an exception.
Default |
100
|
Returns:
| Type | Description |
|---|---|
List[str]
|
A single expression as a list of string tokens in infix notation. |
Raises:
| Type | Description |
|---|---|
Exception
|
If a valid expression cannot be produced within |
Source code in SRToolkit/utils/expression_generator.py
generate_n_expressions
generate_n_expressions(expression_description: Union[str, SymbolLibrary], num_expressions: int, unique: bool = True, max_expression_length: int = 50, verbose: bool = True) -> List[List[str]]
Sample num_expressions expressions from a grammar or symbol library.
Examples:
>>> len(generate_n_expressions(SymbolLibrary.default_symbols(5), 100, verbose=False))
100
>>> generate_n_expressions(SymbolLibrary.from_symbol_list([], 1), 3, unique=False, verbose=False, max_expression_length=1)
[['X_0'], ['X_0'], ['X_0']]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expression_description
|
Union[str, SymbolLibrary]
|
Grammar source as a NLTK PCFG string or a SymbolLibrary (a generic PCFG is built automatically via create_generic_pcfg). |
required |
num_expressions
|
int
|
Number of expressions to generate. |
required |
unique
|
bool
|
If |
True
|
max_expression_length
|
int
|
Maximum token count per expression. Values below |
50
|
verbose
|
bool
|
Display a progress bar. Default |
True
|
Returns:
| Type | Description |
|---|---|
List[List[str]]
|
List of expressions, each represented as a list of string tokens in infix notation. |