Javascript must be enabled to continue!

Protein feature engineering framework for AMPylation site prediction

AbstractAMPylation is a biologically significant yet understudied post-translational modification where an adenosine monophosphate (AMP) group is added to Tyrosine and Threonine residues primarily. While recent work has illuminated the prevalence and functional impacts of AMPylation, experimental identification of AMPylation sites remains challenging. Computational prediction techniques provide a faster alternative approach. The predictive performance of machine learning models is highly dependent on the features used to represent the raw amino acid sequences. In this work, we introduce a novel feature extraction pipeline to encode the key properties relevant to AMPylation site prediction. We utilize a recently published dataset of curated AMPylation sites to develop our feature generation framework. We demonstrate the utility of our extracted features by training various machine learning classifiers, on various numerical representations of the raw sequences extracted with the help of our framework. Tenfold cross-validation is used to evaluate the model’s capability to distinguish between AMPylated and non-AMPylated sites. The top-performing set of features extracted achieved MCC score of 0.58, Accuracy of 0.8, AUC-ROC of 0.85 and F1 score of 0.73. Further, we elucidate the behaviour of the model on the set of features consisting of monogram and bigram counts for various representations using SHapley Additive exPlanations.

Springer Science and Business Media LLC

Hardik Prabhu Hrushikesh Bhosale Aamod Sane Renu Dhadwal Vigneshwar Ramakrishnan Jayaraman Valadi

Scientific Reports

2024

Title: Protein feature engineering framework for AMPylation site prediction

Description:

AbstractAMPylation is a biologically significant yet understudied post-translational modification where an adenosine monophosphate (AMP) group is added to Tyrosine and Threonine residues primarily.

While recent work has illuminated the prevalence and functional impacts of AMPylation, experimental identification of AMPylation sites remains challenging.

Computational prediction techniques provide a faster alternative approach.

The predictive performance of machine learning models is highly dependent on the features used to represent the raw amino acid sequences.

In this work, we introduce a novel feature extraction pipeline to encode the key properties relevant to AMPylation site prediction.

We utilize a recently published dataset of curated AMPylation sites to develop our feature generation framework.

We demonstrate the utility of our extracted features by training various machine learning classifiers, on various numerical representations of the raw sequences extracted with the help of our framework.

Tenfold cross-validation is used to evaluate the model’s capability to distinguish between AMPylated and non-AMPylated sites.

The top-performing set of features extracted achieved MCC score of 0.

58, Accuracy of 0.

8, AUC-ROC of 0.

85 and F1 score of 0.

73.

Further, we elucidate the behaviour of the model on the set of features consisting of monogram and bigram counts for various representations using SHapley Additive exPlanations.

Back

Significance statementSome 25 years ago it was discovered that the activity of the ER chaperone BiP is regulated by covalent modification, the nature of which, AMPylation (not ADPr...

Structure et fonction des toxines bactériennes à domaine FIC

Les protéines à domaine FIC (Filamentation induced by cAMP) sont très répandues chez les bactéries où elles catalysent l’ajout d’une modification post-traductionnelle contenant un ...

Endothelial Protein C Receptor

IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...

Fic and non-Fic AMPylases: protein AMPylation in metazoans

Protein AMPylation refers to the covalent attachment of an AMP moiety to the amino acid side chains of target proteins using ATP as nucleotide donor. This process is catalysed by d...

Amino acid features: a missing compartment of prediction of protein function

AbstractEnormous computational efforts have been carried out to predict structure and function of protein. However, nearly all of these efforts have been focused on prediction of f...

TINGKAT PROTEIN DAN LISIN DALAM RANSUM TERHADAP EFISIENSI LISIN DAN PROTEIN NETTO PADA AYAM KAMPUNG UMUR 12 MINGGU

Penelitian yang dilakukan ini dalam mencari pengaruh tingkat protein dan lisin terhadap efisiensi lisin dan penggunaan protein netto pada ayam kampung yang diperlihara sampai umur ...

From features to functions : leveraging protein feature architectures in comparative genomics

When analyzing genomic data, one of the key challenges is the annotation of new genes. The toolkit for incorporating newly discovered proteins into a comprehensive evolutionary and...

Relationship Between Prediction Accuracy and Feature Importance Reliability: an Empirical and Theoretical Study

Abstract There is significant interest in using neuroimaging data to predict behavior. The predictive models are often interpreted by the computation of feature imp...

Email:
Password:

Email:

Protein feature engineering framework for AMPylation site prediction

Related Results