Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

TEDLH: Domain HMMs for sensitive detection of remote homologues

View through CrossRef
Abstract Motivation The Encyclopedia of Domains (TED) provides domain annotations for proteins in the AlphaFold Protein Structure Database (AFDB) using a consensus of three state-of-the-art structure-based methods. We used these TED domain annotations to construct profile Hidden Markov models (HMMs), collectively forming the TED Library of HMMs (TEDLH). TEDLH enables sensitive sequence and profile searches, supporting systematic exploration of protein domain families and their evolutionary relationships. Results TEDLH links domain HMMs to experimentally determined CATH-PDB structures through direct (primary) and transitive (secondary and tertiary) relationships. Fewer than half of TEDLH HMMs are directly linked to a CATH-PDB domain; the remaining models are connected through transitive relationships. These transitive links extend coverage into more divergent regions of sequence space and better represent CATH superfamily diversity. HMM–HMM comparisons within CATH superfamily 3.30.70.100 illustrate how transitive relationships expand sequence coverage in TEDLH. In this superfamily, 4,813 TEDLH HMMs are connected to 212 CATH-PDB representatives. Primary, secondary, and tertiary relationships progressively capture more divergent sequences (pairwise sequence identity <20%) that retain structural similarity (TM-score >0.6) and a conserved two-layer α/β sandwich core fold. All-against-all HMM–HMM comparisons across TEDLH also reveal sequence similarities across the CATH hierarchy (cross-hits). At low query coverage (<50%), cross-hits are more frequent between CATH classes, whereas at higher coverage thresholds (>70%) they predominantly occur between superfamilies. These cross-hits are not driven by superfamily size or sequence diversity and can provide guidance for CATH curation. As an example, analysis of cross-hits between superfamilies 2.170.130.30 and 3.10.20.30 reveals evolutionary relationships between these groups. Availability and Implementation TEDLH is compatible with HH-suite3 and is available from FigShare https://doi.org/10.6084/m9.figshare.28531754 for local use. Contact c.carreno@ucl.ac.uk
Title: TEDLH: Domain HMMs for sensitive detection of remote homologues
Description:
Abstract Motivation The Encyclopedia of Domains (TED) provides domain annotations for proteins in the AlphaFold Protein Structure Database (AFDB) using a consensus of three state-of-the-art structure-based methods.
We used these TED domain annotations to construct profile Hidden Markov models (HMMs), collectively forming the TED Library of HMMs (TEDLH).
TEDLH enables sensitive sequence and profile searches, supporting systematic exploration of protein domain families and their evolutionary relationships.
Results TEDLH links domain HMMs to experimentally determined CATH-PDB structures through direct (primary) and transitive (secondary and tertiary) relationships.
Fewer than half of TEDLH HMMs are directly linked to a CATH-PDB domain; the remaining models are connected through transitive relationships.
These transitive links extend coverage into more divergent regions of sequence space and better represent CATH superfamily diversity.
HMM–HMM comparisons within CATH superfamily 3.
30.
70.
100 illustrate how transitive relationships expand sequence coverage in TEDLH.
In this superfamily, 4,813 TEDLH HMMs are connected to 212 CATH-PDB representatives.
Primary, secondary, and tertiary relationships progressively capture more divergent sequences (pairwise sequence identity <20%) that retain structural similarity (TM-score >0.
6) and a conserved two-layer α/β sandwich core fold.
All-against-all HMM–HMM comparisons across TEDLH also reveal sequence similarities across the CATH hierarchy (cross-hits).
At low query coverage (<50%), cross-hits are more frequent between CATH classes, whereas at higher coverage thresholds (>70%) they predominantly occur between superfamilies.
These cross-hits are not driven by superfamily size or sequence diversity and can provide guidance for CATH curation.
As an example, analysis of cross-hits between superfamilies 2.
170.
130.
30 and 3.
10.
20.
30 reveals evolutionary relationships between these groups.
Availability and Implementation TEDLH is compatible with HH-suite3 and is available from FigShare https://doi.
org/10.
6084/m9.
figshare.
28531754 for local use.
Contact c.
carreno@ucl.
ac.
uk.

Related Results

A Study on the Difference in Aging Characteristics of Sensitive and Non‐Sensitive Skin
A Study on the Difference in Aging Characteristics of Sensitive and Non‐Sensitive Skin
ABSTRACTBackgroundAccording to Euromonitor and T Mall data statistics from 2017 to 2022, the Chinese market for sensitive skin (SS) skincare is growing by 20% every year, and anti‐...
Contrôle de têtes parlantes par inversion acoustico-articulatoire pour l’apprentissage et la réhabilitation du langage
Contrôle de têtes parlantes par inversion acoustico-articulatoire pour l’apprentissage et la réhabilitation du langage
Les sons de parole peuvent être complétés par l'affichage des articulateurs sur un écran d'ordinateur pour produire de la parole augmentée, un signal potentiellement utile dans tou...
On the Use of Hidden Markov Modeling and Time-frequency Features for Damage Classification in Composite Structures
On the Use of Hidden Markov Modeling and Time-frequency Features for Damage Classification in Composite Structures
A novel approach based on hidden Markov models (HMMs) is proposed for damage classification in composite structures. Time-frequency damage features are first extracted from the mea...
Rodnoosjetljiv jezik na primjeru njemačkih časopisa Brigitte i Der Spiegel
Rodnoosjetljiv jezik na primjeru njemačkih časopisa Brigitte i Der Spiegel
On the basis of the comparative analysis of texts of the German biweekly magazine Brigitte and the weekly magazine Der Spiegel and under the presumption that gender-sensitive langu...
Improved detection and phylogenetic analysis of plant proteins containing LysM domains
Improved detection and phylogenetic analysis of plant proteins containing LysM domains
ABSTRACTPlants perceive N-acetyl-d-glucosamine-containing oligosaccharides that play a role in the interaction with bacteria and fungi, both pathogenic and symbiotic, through cell-...
Using Negative Binomial Hidden Markov models to extrapolate past states of seismicity into the future
Using Negative Binomial Hidden Markov models to extrapolate past states of seismicity into the future
&lt;p&gt;Over the years numerous attempts have been made to obtain the distribution of earthquake numbers. The most popular distribution that has been widely used to descri...
Domain Adaptation and Domain Generalization with Representation Learning
Domain Adaptation and Domain Generalization with Representation Learning
<p>Machine learning has achieved great successes in the area of computer vision, especially in object recognition or classification. One of the core factors of the successes ...
DILRS: Domain-Incremental Learning for Semantic Segmentation in Multi-Source Remote Sensing Data
DILRS: Domain-Incremental Learning for Semantic Segmentation in Multi-Source Remote Sensing Data
With the exponential growth in the speed and volume of remote sensing data, deep learning models are expected to adapt and continually learn over time. Unfortunately, the domain sh...

Back to Top