Javascript must be enabled to continue!
Evaluating the relevance of sequence conservation in the prediction of pathogenic missense variants
View through CrossRef
Abstract
Evolutionary information is the primary tool for detecting functional conservation in nucleic acid and protein. This information has been extensively used to predict structure, interactions and functions in macromolecules. Pathogenicity prediction models rely on multiple sequence alignment information at different levels. However, most accurate genome-wide variant deleteriousness ranking algorithms consider different features to assess the impact of variants. Here, we analyze three different ways of extracting evolutionary information from sequence alignments in the context of pathogenicity predictions at DNA and protein levels. We showed that protein sequence-based information is slightly more informative in the annotation of Clinvar missense variants than those obtained at the DNA level. Furthermore, to achieve the performance of state-of-the-art methods, such as CADD, the conservation of reference and variant, encoded as frequencies of reference/alternate alleles or wild-type/mutant residues, should be included. Our results on a large set of missense variants show that a basic method based on three input features derived from the protein sequence profile performs similarly to the CADD algorithm which uses hundreds of genomic features. This observation indicates that for missense variants, evolutionary information, when properly encoded, plays the primary role in ranking pathogenicity.
Title: Evaluating the relevance of sequence conservation in the prediction of pathogenic missense variants
Description:
Abstract
Evolutionary information is the primary tool for detecting functional conservation in nucleic acid and protein.
This information has been extensively used to predict structure, interactions and functions in macromolecules.
Pathogenicity prediction models rely on multiple sequence alignment information at different levels.
However, most accurate genome-wide variant deleteriousness ranking algorithms consider different features to assess the impact of variants.
Here, we analyze three different ways of extracting evolutionary information from sequence alignments in the context of pathogenicity predictions at DNA and protein levels.
We showed that protein sequence-based information is slightly more informative in the annotation of Clinvar missense variants than those obtained at the DNA level.
Furthermore, to achieve the performance of state-of-the-art methods, such as CADD, the conservation of reference and variant, encoded as frequencies of reference/alternate alleles or wild-type/mutant residues, should be included.
Our results on a large set of missense variants show that a basic method based on three input features derived from the protein sequence profile performs similarly to the CADD algorithm which uses hundreds of genomic features.
This observation indicates that for missense variants, evolutionary information, when properly encoded, plays the primary role in ranking pathogenicity.
Related Results
Leveraging cancer mutation data to predict the pathogenicity of germline missense variants
Leveraging cancer mutation data to predict the pathogenicity of germline missense variants
ABSTRACTInnovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identifie...
Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants
Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants
Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified throug...
Computational Analysis of Missense Variants of G Protein-Coupled Receptors Involved in the Neuroendocrine Regulation of Reproduction
Computational Analysis of Missense Variants of G Protein-Coupled Receptors Involved in the Neuroendocrine Regulation of Reproduction
<b><i>Introduction:</i></b> Many missense variants in G protein-coupled receptors (GPCRs) involved in the neuroendocrine regulation of reproduction have bee...
SLC38A8 mutation spectrum in foveal hypoplasia
SLC38A8 mutation spectrum in foveal hypoplasia
AbstractPurposeSignificant phenotypic overlap exists between ocular albinism and SLC38A8 related foveal hypoplasia (FH) which hinders differential diagnosis. To facilitate molecula...
Clinical Implications of Germline Predisposition Gene Variants in Patients with Refractory or Relapsed B Acute Lymphoblastic Leukemia
Clinical Implications of Germline Predisposition Gene Variants in Patients with Refractory or Relapsed B Acute Lymphoblastic Leukemia
Objectives:Gene variants are important factors in prognosis of the patients with hematological malignancies. In current study, our team investigate the relationship between blood a...
Variant analysis of RNA sequences in severe equine asthma
Variant analysis of RNA sequences in severe equine asthma
Background. Severe equine asthma is a chronic inflammatory disease of the lung in horses similar to low-Th2 late-onset asthma in humans. This study aimed to determine the utility o...
MFRP variations cause nanophthalmos in five Chinese families with distinct phenotypic diversity
MFRP variations cause nanophthalmos in five Chinese families with distinct phenotypic diversity
PurposeNanophthalmos is a congenital ocular structural anomaly that can cause significant visual loss in children. The early diagnosis and then taking appropriate clinical and surg...
PRESCOTT: a population aware, epistatic and structural model accurately predicts missense effect
PRESCOTT: a population aware, epistatic and structural model accurately predicts missense effect
AbstractPredicting the functional impact of point mutations is a complex yet vital task in genomics. PRESCOTT stands at the forefront of this challenge and reconstructs complete mu...

