Glycan Structures¶
Represent polysaccharide molecules and their associated functions
Represent a sugar graph with pseudo-directed edges.
- class glypy.structure.glycan.Glycan(root=None, index_method='dfs', canonicalize=False)[source]¶
Represents a full graph of connected
Monosaccharideobjects and their connecting bonds.- Variables:
root (
Monosaccharide) – The first monosaccharide unit of the glycan, and the reducing end if present.index (
list) – A list of theMonosaccharideinstances inselfin the order they are encountered by traversal bytraversal_methods[index_method]link_index (
list) – A list of theLinkconnecting theMonosaccharideinstances inselfin the order they are encountered by traversal bytraversal_methods[index_method]reducing_end (
ReducedEndorNone) – The reducing end onroot.branch_lengths (
dict) – A dictionary mapping branch symbols to their lengthsbranch_parent_map (
dict) – A dictionary mapping branch symbols to their parent branch symbols
Indexing¶
Glycans support __getitem__() on index, as well as several other
methods related to finding elements and building and maintaining unique indices.
- Glycan.__getitem__(ix)[source]¶
Fetch a
Monosaccharidefromindex.- Return type:
- Raises:
IndexError: – If the provided
ixexceeds the length of the index, or ifindexhas not been populated.
- Glycan.get(ix)[source]¶
Get a
Monosaccharidefrom this structure by itsidvalue.If
indexis populated it will be iterated over, otherwise__iter__()will be called.- Parameters:
- Return type:
- Raises:
IndexError: – If the id value is not found
- Glycan.get_link(ix)[source]¶
Search for a
Linkbyidvalue.This will use
iterlinks()to iterate over the linkages in the structure
Sizing¶
- Glycan.order(deep=False)[source]¶
The number of nodes in the graph.
__len__()is an alias of this- Return type:
Different branches may have different lengths. An indexed Glycan’s branch_lengths dict
holds a mapping from branch label to length. When an existing branch forks, each child branch is given a new label,
but the parent branch is as long as its longest child, and each child branch is at least as long as its parent + 1.
Ordering and Index Building¶
- Glycan.reroot(index_method='dfs')[source]¶
Set
rootto the node with the lowestid.Should only be used if the glycan has been indexed.
- Glycan.reindex(method='dfs')[source]¶
Traverse the graph using the function specified by
method. The order of traversal defines the newidvalue for eachMonosaccharideandLink.The order of traversal also defines the ordering of the
MonosaccharideinindexandLinkinlink_index.Prior to constructing a
Glycaninstance, componentMonosaccharideinstances may be labeled, converting their id field into a tuple.Calls
label_branches()after indexing is complete.- Returns:
self
- Return type:
See also
- Glycan.deindex()[source]¶
When combining two Glycan structures, very often their component ids will overlap, making it impossible to differentiate between a cycle and the new graph. This function mangles all of the node and link ids so that they are distinct from the pre-existing nodes.
- Returns:
self
- Return type:
- Glycan.label_branches()[source]¶
Labels each branch point with an alphabetical symbol. Also computes and stores each branch’s length and stores it in
branch_lengths. Setsbranch_lengthsofselfandLink.labelfor each link attached toself. Also populatesbranch_parent_map.Branch symbols are increasing alphabetical characters. The root branch is denoted ‘-’, though glycans having an
rootwith multiple children will not have any actual branches with that label.Link.labelupdates use the current branch symbol, and the index of that link along that branch.Note
Labeling always uses a depth-first traversal of nodes.
Traversal¶
Glycan structures may be linear or branching, and can be traversed many ways. By
default, a Glycan.depth_first_traversal() is used, which will fully traverse
one branch before visiting another, but other methods are available. Some methods
simply control the behavior of the iterator but do not control the order of iteration,
and take a method argument where either the name of the traversal method or a callable
is specified.
Glycan objects implement the Iterable interface, and their
__iter__() method Glycan.depth_first_traversal().
- Glycan.depth_first_traversal(from_node=None, apply_fn=<function identity>, visited=None)[source]¶
Make a depth-first traversal of the glycan graph. Children are explored in descending bond-order.
This is the default traversal method for all
Glycanobjects.dfs()is an alias of this method. Both names can be used to specify this strategy to_get_traversal_method().When selecting an iteration strategy, this strategy is specified as “dfs”.
- Parameters:
from_node (None or Monosaccharide) – If
from_nodeisNone, then traversal starts from the root node. Otherwise it begins from the given node.apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to
identity()visited (set or None) – A
setof node ID values to ignore. IfNone, defaults to the emptyset
- Yields:
Return Value of
apply_fn, by defaultMonosaccharide
See also
- Glycan.breadth_first_traversal(from_node=None, apply_fn=<function identity>, visited=None)[source]¶
Make a breadth-first traversal of the glycan graph. Children are explored in descending bond-order.
When selecting an iteration strategy, this strategy is specified as “bfs”.
- Parameters:
from_node (None or Monosaccharide) – If
from_nodeisNone, then traversal starts from the root node. Otherwise it begins from the given node.apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to
identity()visited (set or None) – A
setof node ID values to ignore. IfNone, defaults to the emptyset
- Yields:
Return Value of
apply_fn, by defaultMonosaccharide
See also
- Glycan.indexed_traversal(from_node=None, apply_fn=<function identity>, visited=None)[source]¶
Traverse the glycan structure along
index.This is substantially faster than other traversal methods for complete traversals at the cost of a) requiring a call to
reindex()to populateindexif it has not been called, and b) is not automatically updated if the structure is modified.When selecting an iteration strategy, this strategy is specified as “index”.
- Parameters:
from_node (None or Monosaccharide) – If
from_nodeisNone, then traversal starts from the root node. Otherwise it begins from the given node.apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to
identity()visited (set or None) – A
setof node ID values to ignore. IfNone, defaults to the emptyset
- Yields:
Return Value of
apply_fn, by defaultMonosaccharide
See also
- Glycan.iternodes(from_node=None, apply_fn=<function identity>, method='dfs', visited=None)[source]¶
Generic iterator over nodes dispatching to a strategy given by
method, defaulting todepth_first_traversal().- Parameters:
from_node (None or Monosaccharide) – If
from_nodeisNone, then traversal starts from the root node. Otherwise it begins from the given node.apply_fn (function) – A function applied to each node on arrival. If this function returns a non-None value, the result is yielded from the generator, otherwise it is ignored. Defaults to
identity()method (str or
function) – Traversal method to use. See_get_traversal_method()visited (set or None) – A
setof node ID values to ignore. IfNone, defaults to the emptyset
- Yields:
Return Value of
apply_fn, by default Monosaccharide
- Glycan.iterlinks(apply_fn=<function identity>, substituents=False, method='dfs', visited=None)[source]¶
Iterates over all
Linkobjects inGlycan.- Parameters:
substituents (bool) – If
substituentsisTrue, then include theLinkobjects insubstituent_linkson eachMonosaccharidemethod (str or function) – The traversal method controlling the order of the nodes visited
visited (None or set) – The collection of id values to ignore when traversing
- Yields:
Link
- Glycan._get_traversal_method(method)[source]¶
An internal helper method used to resolve traversal methods by name or alias.
Specialized Traversals¶
- Glycan.leaves(bidirectional=False, method='dfs', visited=None)[source]¶
Iterates over all
Monosaccharideobjects inGlycan, yielding only those that have no child nodes.- Parameters:
bidirectional (bool) – If
bidirectionalisTrue, then onlyMonosaccharideobjects with only one entry inlinks.method (str or function) – The traversal method controlling the order of the nodes visited
visited (None or set) – The collection of id values to ignore when traversing
- Yields:
Canonicalization¶
The same glycan structure can be constructed/written multiple ways, but they should all have the same representation. That representation is derived by applying a canonicalization algorithm to the structure, which will sort the branches of each node according to the order they should be traversed in.
If a structure has been constructed manually, the user should call Glycan.canonicalize()
before assuming that identical structures will have the same traversal paths.
- Glycan.canonicalize(canonicalizer=None, **kwargs)[source]¶
Canonicalize this glycan, sorting the order in which its links from the same monosaccharide are traversed.
This currently uses the the GlycoCT canonicalization algorithm.
- Parameters:
canonicalizer (subclass of
CanonicalizerBase, optional) – The canonicalization algorithm to use**kwargs – Forwarded to the canonicalizer
- Returns:
This glycan, reordered in place.
- Return type:
Equality Comparison¶
Glycan objects support equality comparison operators, == and !=. They also support hashing,
using the hash() value of the canonical GlycoCT representation of the structure.
- Glycan.exact_ordering_equality(other)[source]¶
Two glycans are considered equal if they are identically ordered nodes.
See also
glypy.structure.Monosaccharide.exact_ordering_equality()Exact Matching
- Glycan.topological_equality(other)[source]¶
Two glycans are considered equal if they are topologically equal.
See also
glypy.structure.Monosaccharide.topological_equality()Topological Matching
Ambiguous Structures¶
When a structure has unknown or ambiguous connections between is nodes, AmbiguousLink instances
may be used to express the possible options, or their locations may be expressed with an unknown position
constant, represented with -1. Two methods are included to detect these scenarios, and one is used to iterate
over possible configuration states described by AmbiguousLink.
Support for ambiguous connections is only partial. For instance, glypy can read UND sections from
GlycoCT, but does not attempt to render them.
- Glycan.ambiguous_links()[source]¶
Locate all links which are
AmbiguousLinkobjects- Returns:
list of ambiguous links
- Return type:
- Glycan.has_undefined_linkages()[source]¶
Check if this structure has undefined or ambiguous connectivity between its nodes.
- Returns:
If any of its links are
AmbiguousLinkinstances, or have unknown positions (-1).- Return type:
- Glycan.iterconfiguration()[source]¶
Iterate over all valid configurations of ambiguous linkages.
During calculation, the
AmbiguousLinkobjects may be mutated, but by the time a new configuration is yielded all changes should be reversed. If an error occurs during configuration adjustment, it may not be possible to restore the object to its original state.- Yields:
tupleof (AmbiguousLink,Monosaccharide, –Monosaccharide,int,int) The ambiguous link, the parent chosen, the child chosen, the parent linkage site chose, and the child linkage site chosen
Examples
>>> from glypy.io import glyspace >>> structure_record = glyspace.get("G81339YK") >>> structure = structure_record.structure_ >>> configurations = [] >>> for config_list in structure.iterconfiguration(): ... instance = structure.clone() ... for link, conf in config_list: ... link = instance.get_link(link.id) ... parent = instance.get(conf[0].id) ... child = instance.get(conf[1].id) ... link.reconfigure(parent, child, conf[2], conf[3]) ... configurations.append(instance) >>> len(configurations) 4
See also
Serialization¶
There are many ways to write glycan structures as text. By default, glypy will render
Glycan instances using GlycoCT, but the Glycan.serialize() method can
be used to specify different serialization formats. For more information on those options, see
glypy.io.
When converting a Glycan to a string, Glycan.serialize() will be used with its
default argument.
- Glycan.serialize(name='glycoct')[source]¶
Convert the structure to text.
The serialization format is given by a
available_serializers().
- classmethod Glycan.register_serializer(name, method)[source]¶
Add
methodasnameto the set of serializers to pick from inserialize()
Mass Spectrometry Utilities¶
glypy was originally written to support software for mass spectrometry experiments on
glycans. Like all molecular objects in the library, they support the Glycan.mass() and
Glycan.total_composition() methods. Additionally, they can generate glycosidic and cross-ring
fragments, as well as internal fragments caused by any combination of the two.
- Glycan.total_composition(method='dfs')[source]¶
Computes the sum of the composition of all
Monosaccharideobjects inself- Return type:
Composition
- Glycan.mass(average=False, charge=0, mass_data=None, method='dfs')[source]¶
Calculates the total mass of the intact graph by querying each node for its mass.
- Parameters:
average (bool) – Whether or not to use the average isotopic composition when calculating masses. When
average == False, masses are calculated using monoisotopic mass.charge (int) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is
chargemass_data (dict) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information.
- Return type:
Fragmentation¶
- Glycan.fragments(kind='BY', max_cleavages=1, average=False, charge=0, mass_data=None, traversal_method='dfs')[source]¶
Generate carbohydrate backbone fragments from this glycan by examining the disjoint subtrees created by removing one or more monosaccharide-monosaccharide bond.
- Parameters:
kind (
Iterable) – AnyIterableof characters corresponding to A/B/C/X/Y/Z as published by Domon and Costellomax_cleavages (
int) – The maximum number of bonds to break per fragmentaverage (bool, optional, defaults to
False) – Whether or not to use the average isotopic composition when calculating masses. Whenaverage == False, masses are calculated using monoisotopic mass.charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is
chargemass_data (dict, optional, defaults to
None) – If mass_data isNone, standard NIST mass and isotopic abundance data are used. Otherwise the contents ofmass_dataare assumed to contain elemental mass and isotopic abundance information.
- Yields:
GlycanFragment
See also
glypy.composition.composition.calculate_mass(),subtrees(),crossring_subtrees(),Subtree.to_fragments()
- Glycan.name_fragment(fragment, link_index=None)[source]¶
Attempt to assign a full name to a fragment based on the branch and position relative to the reducing end along side A/B/C/X/Y/Z, according to Domon and Costello
The formal grammar for fragment names in Backus-Naur Form:
<full-name> ::= <fragment-name>|<fragment-name-list> <fragment-name> ::= <glycosidic-fragment-name>|<crossring-fragment-name> <fragment-name-list> ::= <fragment-name>"-"<fragment-name-list>|<fragment-name> <glycosidic-fragment-name> ::= <branch-identifier><fragment-type><index> <crossring-fragment-name> ::= <ring-coordinates><fragment-type><branch-identifier><index> <fragment-type> ::= "A" | "B" | "C" | "X" | "Y" | "Z" <ring-coordinate> ::= <integer>,<integer> <index> ::= <integer> <integer> ::= <digit>|<integer><digit> <digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" <branch-identifier> ::= <letter>|<letter><digit>|"" <letter> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
Note
There are also helper methods which modify the called object iteratively,
restoring the original state after the generator is complete. They should
not be used directly, instead see Glycan.fragments() and
Glycan.substructures().
- Glycan.break_links_subtrees(n_links)[source]¶
Iteratively generate all subtrees from glycosidic bond cleavages, creating all \(2{L \choose n}\) subtrees.
- Parameters:
n_links (int) – Number of links to break simultaneously
- Yields:
Subtree
- Glycan.crossring_subtrees(n_links)[source]¶
Generate all combinations of cross ring fragments and glycosidic cleavages, cleaving between 1 and
n_linksmonosaccharides paired withn_links- 1 to 0 glycosidic cleavages.- Parameters:
n_links (int) – Total number of breaks to create, between cross ring cleavages and complemenatary glycosidic cleavages.
- Yields:
Subtree
Sub-Structures¶
- Glycan.substructures(max_cleavages=1, min_cleavages=1, inplace=False)[source]¶
Generate disjoint subtrees from this glycan by removing one or more monosaccharide-monosaccharide bond.
Miscellaneous¶
- Glycan.clone(index_method='dfs', visited=None, cls=None)[source]¶
Create a copy of
self, indexed usingindex_method, a traversal method orNone.
- Glycan.set_reducing_end(value)[source]¶
Sets the reducing end type, and configures the
rootMonosaccharideappropriately.If the reducing_end is not
None, then the following state changes are made toroot:self.root.ring_start = 0 self.root.ring_end = 0 self.root.anomer = "uncyclized"
Else, the correct state is unknown:
self.root.ring_start = UnknownPosition self.root.ring_end = UnknownPosition self.root.anomer = None
Note
This method is called automatically when setting
reducing_end, and does notneed to be used explicitly.
Glycan objects support root() and tree(), returning root
and the object itself, respectively.