Javascript must be enabled to continue!
Protein feature engineering framework for AMPylation site prediction
View through CrossRef
AbstractAMPylation is a biologically significant yet understudied post-translational modification where an adenosine monophosphate (AMP) group is added to Tyrosine and Threonine residues primarily. While recent work has illuminated the prevalence and functional impacts of AMPylation, experimental identification of AMPylation sites remains challenging. Computational prediction techniques provide a faster alternative approach. The predictive performance of machine learning models is highly dependent on the features used to represent the raw amino acid sequences. In this work, we introduce a novel feature extraction pipeline to encode the key properties relevant to AMPylation site prediction. We utilize a recently published dataset of curated AMPylation sites to develop our feature generation framework. We demonstrate the utility of our extracted features by training various machine learning classifiers, on various numerical representations of the raw sequences extracted with the help of our framework. Tenfold cross-validation is used to evaluate the model’s capability to distinguish between AMPylated and non-AMPylated sites. The top-performing set of features extracted achieved MCC score of 0.58, Accuracy of 0.8, AUC-ROC of 0.85 and F1 score of 0.73. Further, we elucidate the behaviour of the model on the set of features consisting of monogram and bigram counts for various representations using SHapley Additive exPlanations.
Springer Science and Business Media LLC
Title: Protein feature engineering framework for AMPylation site prediction
Description:
AbstractAMPylation is a biologically significant yet understudied post-translational modification where an adenosine monophosphate (AMP) group is added to Tyrosine and Threonine residues primarily.
While recent work has illuminated the prevalence and functional impacts of AMPylation, experimental identification of AMPylation sites remains challenging.
Computational prediction techniques provide a faster alternative approach.
The predictive performance of machine learning models is highly dependent on the features used to represent the raw amino acid sequences.
In this work, we introduce a novel feature extraction pipeline to encode the key properties relevant to AMPylation site prediction.
We utilize a recently published dataset of curated AMPylation sites to develop our feature generation framework.
We demonstrate the utility of our extracted features by training various machine learning classifiers, on various numerical representations of the raw sequences extracted with the help of our framework.
Tenfold cross-validation is used to evaluate the model’s capability to distinguish between AMPylated and non-AMPylated sites.
The top-performing set of features extracted achieved MCC score of 0.
58, Accuracy of 0.
8, AUC-ROC of 0.
85 and F1 score of 0.
73.
Further, we elucidate the behaviour of the model on the set of features consisting of monogram and bigram counts for various representations using SHapley Additive exPlanations.
Related Results
FICD acts bi-functionally to AMPylate and de-AMPylate the endoplasmic reticulum chaperone BiP
FICD acts bi-functionally to AMPylate and de-AMPylate the endoplasmic reticulum chaperone BiP
Significance statementSome 25 years ago it was discovered that the activity of the ER chaperone BiP is regulated by covalent modification, the nature of which, AMPylation (not ADPr...
Structure et fonction des toxines bactériennes à domaine FIC
Structure et fonction des toxines bactériennes à domaine FIC
Les protéines à domaine FIC (Filamentation induced by cAMP) sont très répandues chez les bactéries où elles catalysent l’ajout d’une modification post-traductionnelle contenant un ...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
Fic and non-Fic AMPylases: protein AMPylation in metazoans
Fic and non-Fic AMPylases: protein AMPylation in metazoans
Protein AMPylation refers to the covalent attachment of an AMP moiety to the amino acid side chains of target proteins using ATP as nucleotide donor. This process is catalysed by d...
Amino acid features: a missing compartment of prediction of protein function
Amino acid features: a missing compartment of prediction of protein function
AbstractEnormous computational efforts have been carried out to predict structure and function of protein. However, nearly all of these efforts have been focused on prediction of f...
TINGKAT PROTEIN DAN LISIN DALAM RANSUM TERHADAP EFISIENSI LISIN DAN PROTEIN NETTO PADA AYAM KAMPUNG UMUR 12 MINGGU
TINGKAT PROTEIN DAN LISIN DALAM RANSUM TERHADAP EFISIENSI LISIN DAN PROTEIN NETTO PADA AYAM KAMPUNG UMUR 12 MINGGU
Penelitian yang dilakukan ini dalam mencari pengaruh tingkat protein dan lisin terhadap efisiensi lisin dan penggunaan protein netto pada ayam kampung yang diperlihara sampai umur ...
From features to functions : leveraging protein feature architectures in comparative genomics
From features to functions : leveraging protein feature architectures in comparative genomics
When analyzing genomic data, one of the key challenges is the annotation of new genes. The toolkit for incorporating newly discovered proteins into a comprehensive evolutionary and...
Relationship Between Prediction Accuracy and Feature Importance Reliability: an Empirical and Theoretical Study
Relationship Between Prediction Accuracy and Feature Importance Reliability: an Empirical and Theoretical Study
Abstract
There is significant interest in using neuroimaging data to predict behavior. The predictive models are often interpreted by the computation of feature imp...

