Glycan Compositions and Residues¶
GlycanComposition, MonosaccharideResidue, and SubstituentResidue are
useful for working with bag-of-residues where topology and connections are not relevant, but
the aggregate composition is known. These types work with a subset of the IUPAC three letter code
for specifying compositions.
A Monosaccharide is meant to be able to precisely describe where all of the bonds from
the carbon backbone are. A MonosaccharideResidue abstracts away the notion of
position, and automatically deduct a water molecule from their composition to
account for a single incoming and a single outgoing glycosidic bond. Because they do not try to
completely describe the physical configuration of the molecule, MonosaccharideResidue
removes information about ring type, anomericty, configuration, and optionally stem type. The level
of detail discarded is customizable in the MonosaccharideResidue.from_monosaccahride() class method.
A GlycanComposition is just a bag of MonosaccharideResidue and SubstituentResidue,
similar to Composition. Its keys may be either MonosaccharideResidue instances,
SubstituentResidue instances or strings which can be parsed by from_iupac_lite(), and its values
are integers. They may also be written to and from a string using serialize() and
parse().
>>> g = GlycanComposition(Hex=3, HexNAc=2)
>>> g["Hex"]
3
>>> r = MonosaccharideResidue.from_iupac_lite("Hex")
>>> r
MonosaccharideResidue(Hex)
>>> g[r]
3
>>> import glypy
>>> abs(g.mass() - glypy.motifs["N-Glycan core basic 1"].mass()) < 1e-5
True
>>> g2 = GlycanComposition(Hex=5)
>>> g["@n-acetyl"] = -2 # Remove two n-acetyl groups from the composition
>>> abs(g.mass() - g2.mass()) < 1e-5
True
IUPAClite¶
IUPAClite is a dialect of IUPAC for describing monosaccharides while omitting some precise structural information from the grammar. It also includes a compact line notation for glycan compositions.
A monosaccharide is denoted using IUPAC notation, omitting ring shape, anomeric state, chirality,
and modification positions (optionally). For example, a-D-Manp would be written Man, or
b-D-Glcp2NAc would be written GlcNAc.
You can also use generic base types like Hex or Pen for example, to denote a six or five
carbon monosaccharide. The notation is composable, so you can specify an arbitrarily modified
monosaccharide, like HexNAc(S) to specify a sulfated HexNAc, using the parenthesized convention
that separates substituent groups, or dHexN for a deoxy-Hexosamine.
You can also define “floating” substituent groups by prefixing their full lowercase
names with an @-sign, like @sulfate for sulfate or @acetyl for an acetyl group. Lastly,
it is also possible to denote an arbitrary named group using the notation #<name>#<chemical-formula>,
though this should be used only when no other option is available.
See from_iupaclite() and to_iupaclite() implementations of monomer reading and writing.
A glycan composition is written as one or more <monosaccharide>:<count> occurrences separated by
a “; “ (semi-colon + space), enclosed in “{ }”. See GlycanComposition.parse() and
GlycanComposition.serialize() for implementation. A few examples are shown below:
{Hex:5; HexNAc:4; Neu5Ac:1}
{Hex:5; HexNAc:4; Neu5Ac:2}
{Fuc:1; Hex:5; HexNAc:4; Neu5Ac:2}
{Fuc:2; Hex:6; HexNAc:5; Neu5Ac:1}
{Fuc:1; Hex:6; HexNAc:5; Neu5Ac:2}
Residues¶
- class glypy.structure.glycan_composition.MonosaccharideResidue(*args, **kwargs)[source]¶
Represents a
Monosaccharide-like object, save that it does not connect to otherMonosaccharideobjects and does not have properties related to topology, specifically,anomer.A single
MonosaccharideResiduehas lost a water molecule from its composition, reflecting its residual nature. This is accounted for when dealing with aggreates of residues. They also have altered carbon backbone occupancies.MonosaccharideResidueobjects are hashable and comparable on theiriupac_literepresentation, which is given by__str__()orname().- clone(*args, **kwargs)[source]¶
Copies just this
Monosaccharideand itsSubstituentobjects, creating a separate instance with the same data. All mutable data structures are duplicated and distinct from the original.Does not copy any
linksas this would cause recursive duplication of the entireGlycangraph.- Parameters:
prop_id (
bool) – Whether to copyidfromselfto the new instancefast (
bool) – Whether to use the fast-path initialization process inMonosaccharideResidue.__init__()monosaccharide_type (
type) – A subclass ofMonosaccharideResidueto use
- Return type:
- drop_configuration(force=False)¶
Drops the absolute stereochemical configuration of this monosaccharide.
Unless
forceisTrue, ifresolve_special_base_type()returns a truthy value, this function will do nothing.- Parameters:
residue (
Monosaccharide) – The monosaccharide to changeforce (bool, optional) – Whether or not to override known special case named monosaccharides
- Returns:
The mutated monosaccharide
- Return type:
- drop_positions(force=False)¶
Drops the position classifiers from all links and modifications attached to this monosaccharide.
Unless
forceisTrue, ifresolve_special_base_type()returns a truthy value, this function will do nothing.- Parameters:
residue (
Monosaccharide) – The monosaccharide to changeforce (bool, optional) – Whether or not to override known special case named monosaccharides
- Returns:
The mutated monosaccharide
- Return type:
- drop_stem(force=False)¶
Drops the stem, or the carbon ring stereochemical classification from this monosaccharide.
Unless
forceisTrue, ifresolve_special_base_type()returns a truthy value, this function will do nothing.- Parameters:
residue (
Monosaccharide) – The monosaccharide to changeforce (bool, optional) – Whether or not to override known special case named monosaccharides
- Returns:
The mutated monosaccharide
- Return type:
- classmethod from_monosaccharide(monosaccharide, configuration=False, stem=True, ring=False, position=True)[source]¶
Construct an instance of
MonosaccharideResiduefrom an instance ofMonosaccharide. This function attempts to preserve derivatization if possible.This function will create a deep copy of
monosaccharide.- Parameters:
monosaccharide (Monosaccharide) – The monosaccharide to be converted
configuration (bool, optional) – Whether or not to preserve
Configuration. Defaults toFalsestem (bool, optional) – Whether or not to preserve
Stem. Defaults toTruering (bool, optional) – Whether or not to preserve
RingType. Defaults toFalse
- Return type:
- open_attachment_sites(max_occupancy=0)[source]¶
When attaching
Monosaccharideinstances to other objects, bonds are formed between the carbohydrate backbone and the other object. If a site is already bound, the occupying object fills that space on the backbone and prevents other objects from binding there.Currently only cares about the availability of the hydroxyl group. As there is not a hydroxyl attached to the ring-ending carbon, that should not be considered an open site.
If any existing attached units have unknown positions, we can’t provide any known positions, in which case the list of open positions will be a
listof-1s of the length of open sites.A
MonosaccharideResiduehas two fewer open attachment sites than the equivalentMonosaccharide
Frozen Residues¶
MonosaccharideResidue operations may require str conversions which can be expensive.
Instead, use FrozenMonosaccharideResidue, which once created is immutable, and substantially faster.
- class glypy.structure.glycan_composition.FrozenMonosaccharideResidue(*args, **kwargs)[source]¶
A subclass of
MonosaccharideResiduewhich caches the result ofto_iupac_lite()and instances returned byFrozenMonosaccharideResidue.clone()andFrozenMonosaccharideResidue.from_iupac_lite(). Also treated as immutable after initialization throughFrozenMonosaccharideResidue.from_monosaccharide().Note that directly calling
FrozenMonosaccharideResidue.from_monosaccharide()will not retrieve instances from the cache directly, and direct initialization using normal instance creation will neither touch the cache nor freeze the instance.This type is intended for use with
FrozenGlycanCompositionto minimize the number of timesfrom_iupac_lite()is called.- clone(*args, **kwargs)[source]¶
Copies just this
Monosaccharideand its |Substituent|s, creating a separate instance with the same data. All mutable data structures are duplicated and distinct from the original.Does not copy any
linksas this would cause recursive duplication of the entireGlycangraph.
- classmethod from_iupac_lite(string)[source]¶
Parse a string of
iupac_litenotation to produce a residue object- Parameters:
string (
str) – The string to parse- Return type:
ResidueBase
- classmethod from_monosaccharide(monosaccharide, *args, **kwargs)[source]¶
Construct an instance of
MonosaccharideResiduefrom an instance ofMonosaccharide. This function attempts to preserve derivatization if possible.This function will create a deep copy of
monosaccharide.- Parameters:
monosaccharide (Monosaccharide) – The monosaccharide to be converted
configuration (bool, optional) – Whether or not to preserve
Configuration. Defaults toFalsestem (bool, optional) – Whether or not to preserve
Stem. Defaults toTruering (bool, optional) – Whether or not to preserve
RingType. Defaults toFalse
- Return type:
- mass(average=False, charge=0, mass_data=None, substituents=True)[source]¶
Calculates the total mass of
self.- Parameters:
average (bool, optional, defaults to False) – Whether or not to use the average isotopic composition when calculating masses. When
average == False, masses are calculated using monoisotopic mass.charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is
chargemass_data (dict, optional) – If mass_data is None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information. Defaults to
None.substituents (bool, optional, defaults to True) – Whether or not to include substituents’ masses.
- Return type:
- total_composition()[source]¶
Computes the sum of the composition of
selfand each of its linkedSubstituents- Return type:
Composition
Substituent Residues¶
- class glypy.structure.glycan_composition.SubstituentResidue(name, composition=None, id=None, links=None, can_nh_derivatize=None, is_nh_derivatizable=None, derivatize=False, attachment_composition=None)[source]¶
Represent substituent molecules unassociated with a specific monosaccharide residue.
Note
SubstituentResidue’s composition value includes the losses for forming a bond between a monosaccharide residue and the substituent.- Variables:
name (str) – As in
Substituent, but withSubstituentResidue.sigilprepended.composition (
Composition)links (
OrderedMultiMap)_order (
int)
- classmethod from_iupac_lite(name)[source]¶
Parse a string of
iupac_litenotation to produce a residue object- Parameters:
string (
str) – The string to parse- Return type:
ResidueBase
- sigil = '@'¶
All substituent string identifiers are prefixed with this character for the
from_iupac_lite()parser
Glycan Composition¶
- class glypy.structure.glycan_composition.GlycanComposition(*args, **kwargs)[source]¶
Describe a glycan as a collection of
MonosaccharideResiduecounts without explicit linkage information relating how each monosaccharide is connected to its neighbors.This class subclasses
dict, and assumes that keys will either beMonosaccharideResidueinstances,SubstituentResidueinstances, or strings iniupac_liteformat which will be parsed into one of these types. While other types may be used, this is not recommended. All standarddictmethods are supported.GlycanCompositionobjects may be derivatized just asGlycanobjects are, withglypy.composition.composition_transform.derivatize()andglypy.composition.composition_transform.strip_derivatization().GlycanComposition objects also support composition arithmetic, and can be added or subtracted from each other or multiplied by an integer.
As GlycanComposition is not a complete structure, they cannot be translated into text formats as full
Glycanobjects are. They may instead be converted to and from a short-form text notation usingGlycanComposition.serialize()and reconstructed from this format usingGlycanComposition.parse().- Variables:
reducing_end (ReducedEnd) – Describe the reducing end of the aggregate without binding it to a specific monosaccharide. This will contribute to composition and mass calculations.
_composition_offset (CComposition) – Account for the one water molecule’s worth of composition left over from applying the “residue” transformation to each monosaccharide in the aggregate.
- __init__(*args, **kwargs)[source]¶
Initialize a
GlycanCompositionusing the provided objects or keyword arguments, imitating thedictinitialization signature.If a
Mappingis provided as a positional argument, it will be used as a template. If arbitrary keyword arguments are provided, they will be interpreted usingupdate(). As a special case, if anotherGlycanCompositionis provided, itsreducing_endattribute will also be copied.- Parameters:
*args – Arbitrary positional arguments
**kwargs – Arbitrary keyword arguments
- collapse()[source]¶
Merge redundant keys.
After performing a structure-detail removing operation like
drop_positions(),drop_configurations(), ordrop_stems(), monosaccharide keys may be redundant.collapsewill merge keys which refer to the same type of molecule.
- classmethod from_glycan(glycan)[source]¶
Convert a
Glycaninto aGlycanComposition.- Parameters:
glycan (
Glycan) – The instance to be converted- Return type:
- mass(average=False, charge=0, mass_data=None)[source]¶
Calculates the total mass of
self.Note
The monoisotopic mass is cached on first computation in
_mass.- Parameters:
average (bool, optional, defaults to False) – Whether or not to use the average isotopic composition when calculating masses. When
average == False, masses are calculated using monoisotopic mass.charge (int, optional, defaults to 0) – If charge is non-zero, m/z is calculated, where m is the theoretical mass, and z is
chargemass_data (dict, optional) – If mass_data is
None, standard NIST mass and isotopic abundance data are used. Otherwise the contents of mass_data are assumed to contain elemental mass and isotopic abundance information. Defaults toNone.
- Return type:
- classmethod parse(string)[source]¶
Parse a
strinto aGlycanComposition.This will parse the format produced by
serialize()- Parameters:
string (
str) – The string to parse- Return type:
- query(query, exact=True, **kwargs)[source]¶
Return the total count of all residues in
selfwhich matchqueryusingglypy.io.nomenclature.identity.is_a()- Parameters:
query (
MonosaccharideResidueorstr) – A monosaccharide residue or a string which will be converted into one byfrom_iupac_lite()to test for anis-arelationship with.exact (bool, optional) – Passed to
is_a(). ExplicitlyTrueby default**kwargs – Passed to
is_a()
- Returns:
The total count of all residues which satisfy the
is-arelationship- Return type:
- reinterpret(references, exact=True, **kwargs)[source]¶
Aggregate the counts of all residues in
selffor each monosaccharide inreferencessatisfying anis-arelationship, collapsing multiple residues to a single key. Any residue not aggregated will be preserved as-is.Note
The order of
referencesmatters as any residue matched by a reference will not be considered for later references.- Parameters:
references (
IterableofMonosaccharideResidue) – The monosaccharides with which to test for anis-arelationshipexact (bool, optional) – Passed to
is_a(). ExplicitlyTrueby default**kwargs – Passed to
is_a()
- Returns:
self after key collection and collapse
- Return type:
- total_composition()[source]¶
Computes the sum of the composition of all
Monosaccharideobjects inself- Return type:
Composition
Frozen Composition¶
GlycanComposition objects automatically convert str arguments to MonosaccharideResidue
instances, which as previously mentioned, can be slow. If key objects will not be modified, the
FrozenGlycanComposition is considerably faster for all operations. If both the keys themselves and the
values will not be modified after creation, the HashableGlycanComposition is also useful and hashable.
- class glypy.structure.glycan_composition.FrozenGlycanComposition(*args, **kwargs)[source]¶
A subclass of
GlycanCompositionwhich usesFrozenMonosaccharideResidueinstead ofMonosaccharideResiduewhich reduces the number of timesfrom_iupac_lite()is called.Only use this type if residue names are pre-validated, residue types will not be transformed, and when creating many, many instances.
from_iupac_lite()invokes expensive introspection algorithms which can be costly when repeatedly manipulating the same residue types.- classmethod parse(string)[source]¶
Parse a
strinto aGlycanComposition.This will parse the format produced by
serialize()- Parameters:
string (
str) – The string to parse- Return type:
- thaw()[source]¶
Convert this
FrozenGlycanCompositioninto aGlycanCompositionthat is not frozen.- Return type:
IUPAClite¶
- glypy.structure.glycan_composition.to_iupac_lite(residue)¶
- glypy.structure.glycan_composition.from_iupac_lite(string, residue_class=None)¶