Javascript must be enabled to continue!
Protein-peptide Interaction Representation Learning with Pretrained Language Models
View through CrossRef
Abstract
Protein-peptide Interactions (PpIs) paly essential roles in diverse cellular processes, yet their systematic identification remains challenging due to the limited availability of experimentally annotated protein-peptide interaction data. To address this challenge, we present PepInter, a sequence-based Deep Learning (DL) framework that leverages large-scale pretraining on structurally derived pseudo protein-peptide pairs to learn interaction-relevant representations. Specifically, energy-dominant peptide fragments are extracted from protein complexes curated from non-redundant Protein Data Bank (PDB) structures, enabling the construction of pseudo protein-peptide interaction pairs that capture interface interaction patterns shared with canonical protein-protein interactions. This strategy allows the model to acquire interaction-aware priors in the absence of large-scale annotated protein-peptide complex datasets. Built upon the ESM-Cambrian (ESMC) architecture, PepInter adopts a two-stage pretraining strategy. In the first stage, masked language modeling is used to learn general protein sequence representations. In the second stage, the model is further trained to predict Rosetta-derived energetic scores, explicitly incorporating structural interaction signals into the learned embeddings. Following pretraining, PepInter is fine-tuned for both protein-peptide interaction classification and peptide bioactivity regression tasks. Across multiple benchmark datasets, including protein-peptide binding affinity prediction, PepInter consistently outperforms existing baseline methods and demonstrates strong generalization in identifying biologically meaningful PpIs. Case studies further highlight its ability to recover known interaction patterns and predict novel protein-peptide interactions. Together, these results establish PepInter as a scalable and effective framework for protein-peptide interaction prediction, with strong potential to accelerate peptide-based drug discovery.
Title: Protein-peptide Interaction Representation Learning with Pretrained Language Models
Description:
Abstract
Protein-peptide Interactions (PpIs) paly essential roles in diverse cellular processes, yet their systematic identification remains challenging due to the limited availability of experimentally annotated protein-peptide interaction data.
To address this challenge, we present PepInter, a sequence-based Deep Learning (DL) framework that leverages large-scale pretraining on structurally derived pseudo protein-peptide pairs to learn interaction-relevant representations.
Specifically, energy-dominant peptide fragments are extracted from protein complexes curated from non-redundant Protein Data Bank (PDB) structures, enabling the construction of pseudo protein-peptide interaction pairs that capture interface interaction patterns shared with canonical protein-protein interactions.
This strategy allows the model to acquire interaction-aware priors in the absence of large-scale annotated protein-peptide complex datasets.
Built upon the ESM-Cambrian (ESMC) architecture, PepInter adopts a two-stage pretraining strategy.
In the first stage, masked language modeling is used to learn general protein sequence representations.
In the second stage, the model is further trained to predict Rosetta-derived energetic scores, explicitly incorporating structural interaction signals into the learned embeddings.
Following pretraining, PepInter is fine-tuned for both protein-peptide interaction classification and peptide bioactivity regression tasks.
Across multiple benchmark datasets, including protein-peptide binding affinity prediction, PepInter consistently outperforms existing baseline methods and demonstrates strong generalization in identifying biologically meaningful PpIs.
Case studies further highlight its ability to recover known interaction patterns and predict novel protein-peptide interactions.
Together, these results establish PepInter as a scalable and effective framework for protein-peptide interaction prediction, with strong potential to accelerate peptide-based drug discovery.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program
Abstract
Funding Acknowledgements
Type of funding sources: None.
INTRODUCTION Patients with heart failure (HF)...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
Anemia Is Inversely Associated with Serum C-Peptide Concentrations in Patients with Type 2 Diabetes
Anemia Is Inversely Associated with Serum C-Peptide Concentrations in Patients with Type 2 Diabetes
Results: The aim of the study was to investigate the relationship between anemia and serum C-peptide concentrations in Korean patients with type 2 diabetes. A total of 1,300 subjec...
Coordination of Synthesis and Assembly of a Modular Membrane-Associated [NiFe]-Hydrogenase Is Determined by Cleavage of the C-Terminal Peptide
Coordination of Synthesis and Assembly of a Modular Membrane-Associated [NiFe]-Hydrogenase Is Determined by Cleavage of the C-Terminal Peptide
ABSTRACT
During biosynthesis of [NiFe]-hydrogenase 2 (Hyd-2) of
Escherichia coli
, a 15-amino-acid C-terminal peptide is cleaved from...
Modulating Protein-Protein Interactions via Peptide-Based Inhibitors: Structural and Functional Insights
Modulating Protein-Protein Interactions via Peptide-Based Inhibitors: Structural and Functional Insights
As potential therapeutic targets, protein-protein interactions (PPI) are primary to cellular function and processes. This thesis explores peptide-based PPI inhibitors with respect ...
Expression of peptide YY in all four islet cell types in the developing mouse pancreas suggests a common peptide YY-producing progenitor
Expression of peptide YY in all four islet cell types in the developing mouse pancreas suggests a common peptide YY-producing progenitor
ABSTRACT
The islets of Langerhans contain four distinct endocrine cell types producing the hormones glucagon, insulin, somatostatin and pancreatic polypeptide. These...

