Expression Generator Module
SRToolkit.utils.expression_generator
This module contains helper functions for creating a PCFG with generic probabilities from the SymbolLibrary and to use it for generating random expressions.
create_generic_pcfg(symbol_library)
Creates a generic PCFG from the SymbolLibrary.
Examples:
>>> sl = SymbolLibrary.from_symbol_list(["+", "-", "*", "sin", "^2", "pi"], 2)
>>> print(create_generic_pcfg(sl))
E -> E '+' F [0.2]
E -> E '-' F [0.2]
E -> F [0.6]
F -> F '*' B [0.4]
F -> B [0.6]
B -> T [1.0]
T -> R [0.2]
T -> C [0.2]
T -> V [0.6]
C -> 'pi' [1.0]
R -> 'sin' '(' E ')' [0.4]
R -> P [0.15]
R -> '(' E ')' [0.45]
P -> '(' E ')' '^2' [1.0]
V -> 'X_0' [0.5]
V -> 'X_1' [0.5]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
symbol_library
|
SymbolLibrary
|
The symbol library to use. Defaults to SymbolLibrary.default_symbols(). |
required |
Returns:
Type | Description |
---|---|
str
|
A PCFG with generic probabilities, written as a string. |
Source code in SRToolkit/utils/expression_generator.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|
generate_from_pcfg(grammar_str, start_symbol='E', max_depth=40, limit=100)
Generates a random expression from a PCFG with monte-carlo sampling.
Examples:
>>> generate_from_pcfg("E -> '1' [1.0]")
['1']
>>> grammar = create_generic_pcfg(SymbolLibrary.default_symbols())
>>> len(generate_from_pcfg(grammar)) > 0
True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
grammar_str
|
str
|
Grammar given as a string in the NLTK notation |
required |
start_symbol
|
Non-terminal symbol used as the starting point |
'E'
|
|
max_depth
|
Maximum depth of the generated parse trees. If less than 0, expressions can have arbitrary depth |
40
|
|
limit
|
Number of times the function tries to generate a valid expression before raising an Exception. |
100
|
Raises:
Type | Description |
---|---|
Exception
|
If the maximum number of tries is reached without generating a valid expression |
Returns:
Type | Description |
---|---|
List[str]
|
An expression written as a list of string tokens in the infix notation. |
Source code in SRToolkit/utils/expression_generator.py
generate_n_expressions(expression_description, num_expressions, unique=True, max_expression_length=50, verbose=False)
Generates a set of n expressions.
Examples:
>>> len(generate_n_expressions(SymbolLibrary.default_symbols(5), 100, verbose=False))
100
>>> generate_n_expressions(SymbolLibrary.from_symbol_list([], 1), 3, unique=False, verbose=False, max_expression_length=1)
[['X_0'], ['X_0'], ['X_0']]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expression_description
|
Union[str, SymbolLibrary]
|
Decription of expressions, given as either a grammar in the NLTK notation or a SymbolLibrary instance |
required |
num_expressions
|
int
|
Number of generated expressions |
required |
unique
|
When True, each generated expression will be unique (not necesarily unequivalent to others) |
True
|
|
max_expression_length
|
Generated expressions will have at most "max_expression_length" tokens. If less than 0, expressions can be of arbitrary size. |
50
|
|
verbose
|
If True, adds a progress bar |
False
|
Returns:
Type | Description |
---|---|
List[List[str]]
|
A list of expressions represented as lists of tokens |