Javascript must be enabled to continue!
Protein secondary structure and remote homology detection
View through CrossRef
1AbstractA protein can be represented by its primary, secondary, or tertiary structure. With recent advances in AI, there is now as much tertiary as primary structural data available. Fast and accurate search methods exist for both types of data, with searches over both representations being highly precise. However, primary structure data can sometimes be incomplete. As a result, tertiary structure has become the gold standard for remote homology detection.How does secondary structure perform in remote homology detection? Secondary structure interprets proteins as a sequence using an alphabet representing helices, strands, or loops. It shares its sequential nature with primary structure while retaining topological information similar to tertiary structure.To assess the effectiveness of secondary structure in remote homology detection, we devised a challenging classification task aimed at determining the superfamily membership of very distantly related protein domains. We used benchmarks from the CATH and SCOP databases and evaluated sequence and structure alignment algorithms on primary, secondary, and tertiary structures.As expected, both basic and advanced sequence alignment algorithms applied to primary structure achieved high precision, but their overall area under the curve was lower compared to the gold standard of structural alignment using tertiary structure.Surprisingly, a simple string comparison algorithm applied to secondary structure performed close to the gold standard. This result supports the hypothesis that key structural information is already encoded in secondary structure and suggests that secondary structure may be a promising representation to use when high-confidence structural data is unavailable, such as in cases involving protein flexibility and disorder.
Cold Spring Harbor Laboratory
Title: Protein secondary structure and remote homology detection
Description:
1AbstractA protein can be represented by its primary, secondary, or tertiary structure.
With recent advances in AI, there is now as much tertiary as primary structural data available.
Fast and accurate search methods exist for both types of data, with searches over both representations being highly precise.
However, primary structure data can sometimes be incomplete.
As a result, tertiary structure has become the gold standard for remote homology detection.
How does secondary structure perform in remote homology detection? Secondary structure interprets proteins as a sequence using an alphabet representing helices, strands, or loops.
It shares its sequential nature with primary structure while retaining topological information similar to tertiary structure.
To assess the effectiveness of secondary structure in remote homology detection, we devised a challenging classification task aimed at determining the superfamily membership of very distantly related protein domains.
We used benchmarks from the CATH and SCOP databases and evaluated sequence and structure alignment algorithms on primary, secondary, and tertiary structures.
As expected, both basic and advanced sequence alignment algorithms applied to primary structure achieved high precision, but their overall area under the curve was lower compared to the gold standard of structural alignment using tertiary structure.
Surprisingly, a simple string comparison algorithm applied to secondary structure performed close to the gold standard.
This result supports the hypothesis that key structural information is already encoded in secondary structure and suggests that secondary structure may be a promising representation to use when high-confidence structural data is unavailable, such as in cases involving protein flexibility and disorder.
Related Results
Reflexive homology
Reflexive homology
Reflexive homology is the homology theory associated to the reflexive crossed simplicial group; one of the fundamental crossed simplicial groups. It is the most general way to exte...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
Homology Modelling: A Computational Tool in Drug Design and Discovery
Homology Modelling: A Computational Tool in Drug Design and Discovery
A drug takes many years to develop and reach the market using the
conventional drug discovery procedure. Computer-aided drug design (CADD) is an
emerging technology that accelerate...
Homology Modelling: A Computational Tool in Drug Design and Discovery
Homology Modelling: A Computational Tool in Drug Design and Discovery
A drug takes many years to develop and reach the market using the conventional drug discovery procedure. Computer-aided drug design (CADD) is an emerging technology that accelerate...
Remote homology search with hidden Potts models
Remote homology search with hidden Potts models
AbstractMost methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics mod...
TINGKAT PROTEIN DAN LISIN DALAM RANSUM TERHADAP EFISIENSI LISIN DAN PROTEIN NETTO PADA AYAM KAMPUNG UMUR 12 MINGGU
TINGKAT PROTEIN DAN LISIN DALAM RANSUM TERHADAP EFISIENSI LISIN DAN PROTEIN NETTO PADA AYAM KAMPUNG UMUR 12 MINGGU
Penelitian yang dilakukan ini dalam mencari pengaruh tingkat protein dan lisin terhadap efisiensi lisin dan penggunaan protein netto pada ayam kampung yang diperlihara sampai umur ...
A note on Khovanov–Rozansky sl2-homology and ordinary Khovanov homology
A note on Khovanov–Rozansky sl2-homology and ordinary Khovanov homology
In this paper we present an explicit isomorphism between Khovanov–Rozansky sl2-homology and ordinary Khovanov homology. This result was originally claimed in Khovanov and Rozansky'...
Non-Homology-Based Prediction of Gene Functions
Non-Homology-Based Prediction of Gene Functions
Abstract
Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identifie...

