Measures Module
SRToolkit.utils.measures
This module contains measures for evaluating the similarity between two expressions.
edit_distance(expr1, expr2, notation='postfix', symbol_library=SymbolLibrary.default_symbols())
Calculates the edit distance between two expressions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr1
|
Union[List[str], Node]
|
Expression given as a list of tokens in the infix notation or as an instance of SRToolkit.utils.expression_tree.Node |
required |
expr2
|
Union[List[str], Node]
|
Expression given as a list of tokens in the infix notation or as an instance of SRToolkit.utils.expression_tree.Node |
required |
notation
|
str
|
The notation in which the distance between the two expressions is computed. Can be one of "infix", "postfix", or "prefix". By default, "postfix" is used to avoid inconsistencies that occur because of parenthesis. |
'postfix'
|
symbol_library
|
SymbolLibrary
|
The symbol library to use when converting the expressions to lists of tokens and vice versa. Defaults to SymbolLibrary.default_symbols(). |
default_symbols()
|
Returns:
Type | Description |
---|---|
float
|
The edit distance between the two expressions written in a given notation. |
Source code in SRToolkit/utils/measures.py
tree_edit_distance(expr1, expr2, symbol_library=SymbolLibrary.default_symbols())
Calculates the tree edit distance between two expressions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr1
|
Union[Node, List[str]]
|
Expression given as a list of tokens in the infix notation or as an instance of SRToolkit.utils.expression_tree.Node |
required |
expr2
|
Union[Node, List[str]]
|
Expression given as a list of tokens in the infix notation or as an instance of SRToolkit.utils.expression_tree.Node |
required |
symbol_library
|
SymbolLibrary
|
Symbol library to use when converting the lists of tokens into an instance of SRToolkit.utils.expression_tree.Node. |
default_symbols()
|
Returns:
Type | Description |
---|---|
float
|
The tree edit distance between the two expressions. |
Source code in SRToolkit/utils/measures.py
create_behavior_matrix(expr, X, num_consts_sampled=32, consts_bounds=(-5, 5), symbol_library=SymbolLibrary.default_symbols(), seed=None)
Creates a behavior matrix from an expression with fee parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr
|
Union[Node, List[str]]
|
An expression given as a list of tokens in the infix notation. |
required |
X
|
ndarray
|
Points on which the expression is evaluated to determine the behavior |
required |
num_consts_sampled
|
int
|
Number of sets of constants sampled |
32
|
consts_bounds
|
Tuple[float, float]
|
Bounds between which constant values are sampled |
(-5, 5)
|
symbol_library
|
SymbolLibrary
|
Symbol library used to transform the expression into an executable function. |
default_symbols()
|
seed
|
int
|
Random seed. If None, generation will be random. |
None
|
Raises:
Type | Description |
---|---|
Exception
|
If expr is not a list of tokens or an instance of SRToolkit.utils.expression_tree.Node. |
Returns:
Type | Description |
---|---|
ndarray
|
A matrix of size (X.shape[0], num_consts_sampled) that represents the behavior of an expression. |
Source code in SRToolkit/utils/measures.py
bed(expr1, expr2, X=None, num_consts_sampled=32, num_points_sampled=64, domain_bounds=None, consts_bounds=(-5, 5), symbol_library=SymbolLibrary.default_symbols(), seed=None)
Computes the Behavioral Embedding Distance (BED) between two expressions or behavior matrices over a given dataset or domain, using Wasserstein distance as a metric.
The BED is computed either by using precomputed behavior matrices or by sampling points from a domain and evaluating the expressions over them. For behavior evaluation, constants can be sampled based on the specified bounds and symbols used in the expressions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expr1
|
Union[Node, List[str], ndarray]
|
The first expression or behavior matrix. If it is an expression, it must be provided as a Node or a list of string representations. If it is already a behavior matrix, it should be a numpy array of size (num_points_sampled, num_consts_sampled). |
required |
expr2
|
Union[Node, List[str], ndarray]
|
The second expression or behavior matrix. Similar to expr1, it should be either a Node, list of strings representing the expression, or a numpy array representing the behavior matrix. |
required |
X
|
Optional[ndarray]
|
Array of points over which behavior is evaluated. If not provided, the domain bounds parameter will be used to sample points. |
None
|
num_consts_sampled
|
int
|
Number of constants sampled for behavior evaluation if expressions are given as Nodes or lists rather than matrices. Default is 32. |
32
|
num_points_sampled
|
int
|
Number of points sampled from the domain if X is not provided. Default is 64. |
64
|
domain_bounds
|
Optional[List[Tuple[float, float]]]
|
The bounds of the domain for sampling points when X is not given. Each tuple represents the lower and upper bounds for a domain feature (e.g., [(0, 1), (0, 2)]). |
None
|
consts_bounds
|
Tuple[float, float]
|
The lower and upper bounds for sampling constants when evaluating expressions. Default is (-5, 5). |
(-5, 5)
|
symbol_library
|
SymbolLibrary
|
The library of symbols used to parse and evaluate expressions. Default is the default symbol library from SymbolLibrary. |
default_symbols()
|
seed
|
int
|
Seed for random number generation during sampling for deterministic results. Default is None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The mean Wasserstein distance computed between the behaviors of the two expressions or |
float
|
matrices over the sampled points. |
Raises:
Type | Description |
---|---|
Exception
|
If X is not provided and domain_bounds is missing, this exception is raised to ensure proper sampling of points for behavior evaluation. |
AssertionError
|
Raised when the shapes of the behavior matrices or sampling points do not match the expected dimensions. |