Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Identifying glycan motifs using a novel subtree mining approach

View through CrossRef
AbstractBackgroundGlycans are complex sugar chains, crucial to many biological processes. By participating in binding interactions with proteins, glycans often play key roles in host–pathogen interactions. The specificities of glycan-binding proteins, such as lectins and antibodies, are governed by motifs within larger glycan structures, and improved characterisations of these determinants would aid research into human diseases. Identification of motifs has previously been approached as a frequent subtree mining problem, and we extend these approaches with a glycan notation that allows recognition of terminal motifs.ResultsIn this work, we customised a frequent subtree mining approach by altering the glycan notation to include information on terminal connections. This allows specific identification of terminal residues as potential motifs, better capturing the complexity of glycan-binding interactions. We achieved this by including additional nodes in a graph representation of the glycan structure to indicate the presence or absence of a linkage at particular backbone carbon positions. Combining this frequent subtree mining approach with a state-of-the-art feature selection algorithm termed minimum-redundancy, maximum-relevance (mRMR), we have generated a classification pipeline that is trained on data from a glycan microarray. When applied to a set of commonly used lectins, the identified motifs were consistent with known binding determinants. Furthermore, logistic regression classifiers trained using these motifs performed well across most lectins examined, with a median AUC value of 0.89.ConclusionsWe present here a new subtree mining approach for the classification of glycan binding and identification of potential binding motifs. The Carbohydrate Classification Accounting for Restricted Linkages (CCARL) method will assist in the interpretation of glycan microarray experiments and will aid in the discovery of novel binding motifs for further experimental characterisation.
Title: Identifying glycan motifs using a novel subtree mining approach
Description:
AbstractBackgroundGlycans are complex sugar chains, crucial to many biological processes.
By participating in binding interactions with proteins, glycans often play key roles in host–pathogen interactions.
The specificities of glycan-binding proteins, such as lectins and antibodies, are governed by motifs within larger glycan structures, and improved characterisations of these determinants would aid research into human diseases.
Identification of motifs has previously been approached as a frequent subtree mining problem, and we extend these approaches with a glycan notation that allows recognition of terminal motifs.
ResultsIn this work, we customised a frequent subtree mining approach by altering the glycan notation to include information on terminal connections.
This allows specific identification of terminal residues as potential motifs, better capturing the complexity of glycan-binding interactions.
We achieved this by including additional nodes in a graph representation of the glycan structure to indicate the presence or absence of a linkage at particular backbone carbon positions.
Combining this frequent subtree mining approach with a state-of-the-art feature selection algorithm termed minimum-redundancy, maximum-relevance (mRMR), we have generated a classification pipeline that is trained on data from a glycan microarray.
When applied to a set of commonly used lectins, the identified motifs were consistent with known binding determinants.
Furthermore, logistic regression classifiers trained using these motifs performed well across most lectins examined, with a median AUC value of 0.
89.
ConclusionsWe present here a new subtree mining approach for the classification of glycan binding and identification of potential binding motifs.
The Carbohydrate Classification Accounting for Restricted Linkages (CCARL) method will assist in the interpretation of glycan microarray experiments and will aid in the discovery of novel binding motifs for further experimental characterisation.

Related Results

Glycan profiling of the gut microbiota by Glycan-seq
Glycan profiling of the gut microbiota by Glycan-seq
Abstract Bacterial glycans modulate the cross talk between the gut microbiota and its host. However, little is known about these glycans because of the lack of appro...
Glycomic profiling of the gut microbiota by Glycan-seq
Glycomic profiling of the gut microbiota by Glycan-seq
AbstractBackgroundThere has been immense interest in studying the relationship between the gut microbiota and human health. Bacterial glycans modulate the cross talk between the gu...
Glycoremodeling of monoclonal antibodies
Glycoremodeling of monoclonal antibodies
Monoclonal antibodies (mAbs) are therapeutic glycoproteins mostly used in the areas of oncology and immunology. The structure of an mAb can be divided into the Fab fragment and the...
Optimisation of potash mining technology for cell and pillar mining method
Optimisation of potash mining technology for cell and pillar mining method
The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...
Characterization and statistical modeling of glycosylation changes in sickle cell disease
Characterization and statistical modeling of glycosylation changes in sickle cell disease
AbstractSickle cell disease is an inherited genetic disorder that causes anemia, pain crises, organ infarction, and infections in 13 million people worldwide. Previous studies have...
Elucidation of the glycan structure of the b-type flagellin ofPseudomonas aeruginosaPAO1
Elucidation of the glycan structure of the b-type flagellin ofPseudomonas aeruginosaPAO1
AbstractFlagella are essential for motility and pathogenicity in many bacteria. The main component of the flagellar filament, flagellin, often undergoes post-translational modifica...
A Framework for Sampling-Based XML Data Pricing
A Framework for Sampling-Based XML Data Pricing
While price and data quality should define the major tradeoff for consumers in data markets, prices are usually prescribed by vendors and data quality is not negotiable. In this pa...
Predicting regional somatic mutation rates using DNA motifs
Predicting regional somatic mutation rates using DNA motifs
Abstract How the locus-specificity of epigenetic modifications is regulated remains an unanswered question. A contributing mechanism is that epig...

Back to Top