Substructure Search Methods¶
A set of algorithms for finding similar structures and substructures.
Inclusion Comparison Algorithms¶
Inclusion-based search asks whether one graph is completely contained in another sub-graph,
using either Exact Matching or Topological Matching according to the chosen
approach. These comparisons can be made fuzzy, using commutative_similarity()
to evaluate matches between individual nodes in a graph.
Direct Comparators¶
These direct algorithms compare two structures starting from the provided root nodes, using
their particular matching strategy. They are useful for immediate comparisons of similarity,
returning the maximal commutative_similarity() of their traversal, but they
do not consider non-root-to-root comparisons.
- glypy.algorithms.subtree_search.topological_inclusion(target, reference, substituents=True, tolerance=0, visited=None)¶
A generalization of
topological_equality()which allows fortargetto be matched toreference, but forreferenceto include more. Consequently, this method is not commutative.
- Parameters:
target (
Monosaccharide) – The monosaccharide to comparereference (
Monosaccharide) – The monosaccharide to compare againstsubstituents (
bool, optional) – Whether or not to compare with the substituents of each node. Defaults toTrue.tolerance (
float, optional) – The maximum difference to permit between nodes for inclusion. Defaults to 0- Returns:
The inclusion score. Greater than 0 indicates topological inclusion, though larger corresponds to better alignment.
- Return type:
- glypy.algorithms.subtree_search.exact_ordering_inclusion(target, reference, substituents=True, tolerance=0, visited=None)[source]¶
A generalization of
exact_ordering_equality()which allows fortargetto be matched toreference, but forreferenceto include more. Consequently, this method is not commutative.
- Parameters:
target (
Monosaccharide) – The monosaccharide to comparereference (
Monosaccharide) – The monosaccharide to compare againstsubstituents (
bool, optional) – Whether or not to compare with the substituents of each node. Defaults toTrue.tolerance (
float, optional) – The maximum difference to permit between nodes for inclusion. Defaults to 0- Returns:
The similarity score between the two structures. A score of 0 means no inclusion, and a non-zero score means inclusion, with larger scores indicating more node pairs matching.
- Return type:
See also
commutative_similarity_score_with_tolerance,topological_inclusion
Substructure Inclusion-based Algorithms¶
Building off the Direct Comparators, these algorithms answer higher level questions about whether one structure is included in another.
- glypy.algorithms.subtree_search.subtree_of(subtree, tree, exact=False, include_substituents=True, tolerance=0)[source]¶
Test to see if
subtreeis included intreeanywhere. Returns the node id number of the first occurence ofsubtreeincluded intreeorNoneif it is not found.
- Parameters:
subtree (
Glycan) – The structure to search for. The search attempts to match the complete structure of subtree.tree (
Glycan) – The sturcture to search in. The search iterates over each residue intreeand calls a comparator function, comparing thesubtreeto the substructure rooted at that residue.exact (
bool) – IfTrue, useexact_ordering_inclusion()to compare nodes. Otherwise usetopological_inclusion(). Defaults toFalse.- Return type:
intorNoneif no match
- glypy.algorithms.subtree_search.find_matching_subtree_roots(subtree, tree, exact=False, include_substituents=True, tolerance=0)[source]¶
Find the list of nodes where occurences of
subtreeincluded intreeare rooted.
- Parameters:
subtree (
Glycan) – The structure to search for. The search attempts to match the complete structure of subtree.tree (
Glycan) – The sturcture to search in. The search iterates over each residue intreeand calls a comparator function, comparing thesubtreeto the substructure rooted at that residue.exact (
bool) – IfTrue, useexact_ordering_inclusion()to compare nodes. Otherwise usetopological_inclusion(). Defaults toFalse.- Return type:
Partial Subgraph Walks¶
- glypy.algorithms.subtree_search.walk_with(query, reference, visited=None, comparator=<function commutative_similarity>, include_substituents=True)[source]¶
Walk the
queryalongreference, yielding successive matched nodes along a subtree ofreferenceusing a Exact Matching traversal.This function provides slightly more detail than
subtree_of()as it exposes every step along the subgraph traversal, rather than just the root.
- Parameters:
query (
Glycan) – The query structure to search withreference (
Glycan) – The reference structure to search invisited (
set, optional) – The set of node id pairs to ignorecomparator (
Callable, optional) – TheCallableobject which can compare twoMonosaccharide- Yields:
(
Monosaccharide,Monosaccharide) – Pairs of matched nodes along a path
Common Substructure-based Algorithms¶
- glypy.algorithms.subtree_search.maximum_common_subgraph(seq_a, seq_b, exact=True)¶
Find the maximum common subgraph between
seq_aandseq_b.
- Parameters:
- Return type:
References
[1] K. F. Aoki, A. Yamaguchi, Y. Okuno, T. Akutsu, N. Ueda, M. Kanehisa, and H. Mamitsuka, “Efficient tree-matching methods for accurate carbohydrate database queries.” Genome Inform. Jan. 2003.
- glypy.algorithms.subtree_search.n_saccharide_similarity(self, other, n=2, exact=False)[source]¶
Calculate n-saccharide similarity between two structures
- Parameters:
- Returns:
score – How similar these structures are at the n-saccharide level. Ranges between 0 and 1.0 where 1.0 is exactly the same, while 0.0 means no shared n-saccharides.
- Return type:
References
[1] K. F. Aoki, A. Yamaguchi, Y. Okuno, T. Akutsu, N. Ueda, M. Kanehisa, and H. Mamitsuka, “Efficient tree-matching methods for accurate carbohydrate database queries.” Genome Inform. Jan. 2003.
Implementation Details of maximum_common_subgraph()¶
- class glypy.algorithms.subtree_search.MaximumCommonSubgraphSolver(seq_a, seq_b, exact=True)[source]¶
Find the maximum common subgraph between
seq_aandseq_b.
- Variables:
References
[1] K. F. Aoki, A. Yamaguchi, Y. Okuno, T. Akutsu, N. Ueda, M. Kanehisa, and H. Mamitsuka, “Efficient tree-matching methods for accurate carbohydrate database queries.” Genome Inform. Jan. 2003.
Treelets¶
- glypy.algorithms.subtree_search.treelets(glycan, k, distinct=True)[source]¶
Iterator over all distinct \(k\)-treelets of
tree, all unique sub-trees oftreewithkmonosaccharides.
- glypy.algorithms.subtree_search.treelet_enrichment(cond1, cond2, k, distinct=True)¶
Perform a test for each treelet to determine if it is enriched in cond1 compared to cond2.
- Parameters:
cond1 (list of
Glycan) – Glycans from the first condition, the condition to be tested for enrichmentcond2 (list of
Glycan) – Glycans from the second condition, the background conditionk (int) – The size of the treelet to use
distinct (bool) – Whether or not to count redundant treelets. Defaults to
True- Returns:
A mapping from treelet to p value from Fisher’s Exact Test for that treelet being found with its observed frequency in cond1 if it is as common in cond1 as in cond2.
- Return type:
References
Pevzner, P., & Shamir, R. (2011). Bioinformatics for Biologists. New York, NY, USA: Cambridge University Press.
Implementation Details¶
Treelet methods are built atop a more complex object system
- class glypy.algorithms.subtree_search.Treelet(subtree, frontier_ids)[source]¶
Represents a subgraph of a larger
Glycan, with a frontier of node ids which are children of the current subgraph.