Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

TEDLH: Domain HMMs for sensitive detection of remote homologues

View through CrossRef
Abstract Motivation The Encyclopedia of Domains (TED) provides domain annotations for proteins in the AlphaFold Protein Structure Database (AFDB) using a consensus of three state-of-the-art structure-based methods. We used these TED domain annotations to construct profile Hidden Markov models (HMMs), collectively forming the TED Library of HMMs (TEDLH). TEDLH enables sensitive sequence and profile searches, supporting systematic exploration of protein domain families and their evolutionary relationships. Results TEDLH links domain HMMs to experimentally determined CATH-PDB structures through direct (primary) and transitive (secondary and tertiary) relationships. Fewer than half of TEDLH HMMs are directly linked to a CATH-PDB domain; the remaining models are connected through transitive relationships. These transitive links extend coverage into more divergent regions of sequence space and better represent CATH superfamily diversity. HMM–HMM comparisons within CATH superfamily 3.30.70.100 illustrate how transitive relationships expand sequence coverage in TEDLH. In this superfamily, 4,813 TEDLH HMMs are connected to 212 CATH-PDB representatives. Primary, secondary, and tertiary relationships progressively capture more divergent sequences (pairwise sequence identity <20%) that retain structural similarity (TM-score >0.6) and a conserved two-layer α/β sandwich core fold. All-against-all HMM–HMM comparisons across TEDLH also reveal sequence similarities across the CATH hierarchy (cross-hits). At low query coverage (<50%), cross-hits are more frequent between CATH classes, whereas at higher coverage thresholds (>70%) they predominantly occur between superfamilies. These cross-hits are not driven by superfamily size or sequence diversity and can provide guidance for CATH curation. As an example, analysis of cross-hits between superfamilies 2.170.130.30 and 3.10.20.30 reveals evolutionary relationships between these groups. Availability and Implementation TEDLH is compatible with HH-suite3 and is available from FigShare https://doi.org/10.6084/m9.figshare.28531754 for local use. Contact c.carreno@ucl.ac.uk
Title: TEDLH: Domain HMMs for sensitive detection of remote homologues
Description:
Abstract Motivation The Encyclopedia of Domains (TED) provides domain annotations for proteins in the AlphaFold Protein Structure Database (AFDB) using a consensus of three state-of-the-art structure-based methods.
We used these TED domain annotations to construct profile Hidden Markov models (HMMs), collectively forming the TED Library of HMMs (TEDLH).
TEDLH enables sensitive sequence and profile searches, supporting systematic exploration of protein domain families and their evolutionary relationships.
Results TEDLH links domain HMMs to experimentally determined CATH-PDB structures through direct (primary) and transitive (secondary and tertiary) relationships.
Fewer than half of TEDLH HMMs are directly linked to a CATH-PDB domain; the remaining models are connected through transitive relationships.
These transitive links extend coverage into more divergent regions of sequence space and better represent CATH superfamily diversity.
HMM–HMM comparisons within CATH superfamily 3.
30.
70.
100 illustrate how transitive relationships expand sequence coverage in TEDLH.
In this superfamily, 4,813 TEDLH HMMs are connected to 212 CATH-PDB representatives.
Primary, secondary, and tertiary relationships progressively capture more divergent sequences (pairwise sequence identity <20%) that retain structural similarity (TM-score >0.
6) and a conserved two-layer α/β sandwich core fold.
All-against-all HMM–HMM comparisons across TEDLH also reveal sequence similarities across the CATH hierarchy (cross-hits).
At low query coverage (<50%), cross-hits are more frequent between CATH classes, whereas at higher coverage thresholds (>70%) they predominantly occur between superfamilies.
These cross-hits are not driven by superfamily size or sequence diversity and can provide guidance for CATH curation.
As an example, analysis of cross-hits between superfamilies 2.
170.
130.
30 and 3.
10.
20.
30 reveals evolutionary relationships between these groups.
Availability and Implementation TEDLH is compatible with HH-suite3 and is available from FigShare https://doi.
org/10.
6084/m9.
figshare.
28531754 for local use.
Contact c.
carreno@ucl.
ac.
uk.

Related Results

A Study on the Difference in Aging Characteristics of Sensitive and Non‐Sensitive Skin
A Study on the Difference in Aging Characteristics of Sensitive and Non‐Sensitive Skin
ABSTRACTBackgroundAccording to Euromonitor and T Mall data statistics from 2017 to 2022, the Chinese market for sensitive skin (SS) skincare is growing by 20% every year, and anti‐...
Rodnoosjetljiv jezik na primjeru njemačkih časopisa Brigitte i Der Spiegel
Rodnoosjetljiv jezik na primjeru njemačkih časopisa Brigitte i Der Spiegel
On the basis of the comparative analysis of texts of the German biweekly magazine Brigitte and the weekly magazine Der Spiegel and under the presumption that gender-sensitive langu...
On the Use of Hidden Markov Modeling and Time-frequency Features for Damage Classification in Composite Structures
On the Use of Hidden Markov Modeling and Time-frequency Features for Damage Classification in Composite Structures
A novel approach based on hidden Markov models (HMMs) is proposed for damage classification in composite structures. Time-frequency damage features are first extracted from the mea...
Contrôle de têtes parlantes par inversion acoustico-articulatoire pour l’apprentissage et la réhabilitation du langage
Contrôle de têtes parlantes par inversion acoustico-articulatoire pour l’apprentissage et la réhabilitation du langage
Les sons de parole peuvent être complétés par l'affichage des articulateurs sur un écran d'ordinateur pour produire de la parole augmentée, un signal potentiellement utile dans tou...
Improved detection and phylogenetic analysis of plant proteins containing LysM domains
Improved detection and phylogenetic analysis of plant proteins containing LysM domains
ABSTRACTPlants perceive N-acetyl-d-glucosamine-containing oligosaccharides that play a role in the interaction with bacteria and fungi, both pathogenic and symbiotic, through cell-...
DILRS: Domain-Incremental Learning for Semantic Segmentation in Multi-Source Remote Sensing Data
DILRS: Domain-Incremental Learning for Semantic Segmentation in Multi-Source Remote Sensing Data
With the exponential growth in the speed and volume of remote sensing data, deep learning models are expected to adapt and continually learn over time. Unfortunately, the domain sh...
Comparison of Single-channel and Split-window Methods for Estimating Land Surface Temperature from Landsat 8 Data
Comparison of Single-channel and Split-window Methods for Estimating Land Surface Temperature from Landsat 8 Data
Abstract: Landsat 8 is the eighth satellite in the Landsat program, which provides images at 11 spectral channels, including 2 thermal infrared bands at a spatial resolution of 100...
A study on design and implementation of dynamic location tracking system for locating remote control
A study on design and implementation of dynamic location tracking system for locating remote control
With the development of IoT technology, there is a growing demand for location based services for checking the mobility and identity of users, and remote controls for remote contro...

Back to Top