Javascript must be enabled to continue!
Leveraging cancer mutation data to predict the pathogenicity of germline missense variants
View through CrossRef
ABSTRACTInnovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed. We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar. We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset). The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ∼1.5 million individuals. Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.6%) as likely pathogenic/pathogenic, 261 (37.8%) as uncertain significance, and 4 (0.6%) as likely benign/benign. The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.3 (95% confidence interval: 24.2-33.1, p < 0.001), compared with all other germline missense variants in the same 216 genes. Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset. There was high area under precision-recall curve values of 0.847 and 0.829 for logistic regression and random forest models, respectively, when applied to the test dataset. With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders.AUTHOR SUMMARYOur study introduces an approach to improve the interpretation of rare genetic variation, specifically missense variants that can alter proteins and cause disease. We found that published evidence from somatic cancer sequencing studies may be relevant to understanding the impact of the same variant in the context of rare inherited (Mendelian) disorders. By using widely available datasets, we noted that many cancer driver mutations have also been observed as rare germline variants associated with inherited disorders. This intersection led us to employ machine learning techniques to assess how cancer mutation data can predict the pathogenicity of germline variants. We trained machine learning models and tested them on a separate dataset curated by searching public and private genome-wide sequencing data from over a million participants. Our models were able to successfully identify pathogenic genetic changes, demonstrating strong performance in predicting disease-causing variants. This study highlights that cancer mutation data can enhance the interpretation of rare missense variants, aiding in the diagnosis and understanding of rare diseases. Integrating this approach into current genetic classification frameworks could be beneficial, and opens new avenues for leveraging existing cancer research to benefit broader genetic research and diagnostics for rare genetic conditions.
Title: Leveraging cancer mutation data to predict the pathogenicity of germline missense variants
Description:
ABSTRACTInnovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants.
Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders.
The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed.
We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar.
We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset).
The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ∼1.
5 million individuals.
Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.
6%) as likely pathogenic/pathogenic, 261 (37.
8%) as uncertain significance, and 4 (0.
6%) as likely benign/benign.
The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.
3 (95% confidence interval: 24.
2-33.
1, p < 0.
001), compared with all other germline missense variants in the same 216 genes.
Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset.
There was high area under precision-recall curve values of 0.
847 and 0.
829 for logistic regression and random forest models, respectively, when applied to the test dataset.
With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders.
AUTHOR SUMMARYOur study introduces an approach to improve the interpretation of rare genetic variation, specifically missense variants that can alter proteins and cause disease.
We found that published evidence from somatic cancer sequencing studies may be relevant to understanding the impact of the same variant in the context of rare inherited (Mendelian) disorders.
By using widely available datasets, we noted that many cancer driver mutations have also been observed as rare germline variants associated with inherited disorders.
This intersection led us to employ machine learning techniques to assess how cancer mutation data can predict the pathogenicity of germline variants.
We trained machine learning models and tested them on a separate dataset curated by searching public and private genome-wide sequencing data from over a million participants.
Our models were able to successfully identify pathogenic genetic changes, demonstrating strong performance in predicting disease-causing variants.
This study highlights that cancer mutation data can enhance the interpretation of rare missense variants, aiding in the diagnosis and understanding of rare diseases.
Integrating this approach into current genetic classification frameworks could be beneficial, and opens new avenues for leveraging existing cancer research to benefit broader genetic research and diagnostics for rare genetic conditions.
Related Results
Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants
Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants
Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified throug...
Abstract OI-1: OI-1 Decoding breast cancer predisposition genes
Abstract OI-1: OI-1 Decoding breast cancer predisposition genes
Abstract
Women with one or more first-degree female relatives with a history of breast cancer have a two-fold increased risk of developing breast cancer. This risk i...
Clinical Implications of Germline Predisposition Gene Variants in Patients with Refractory or Relapsed B Acute Lymphoblastic Leukemia
Clinical Implications of Germline Predisposition Gene Variants in Patients with Refractory or Relapsed B Acute Lymphoblastic Leukemia
Objectives:Gene variants are important factors in prognosis of the patients with hematological malignancies. In current study, our team investigate the relationship between blood a...
Pathogenic germline variants among Thai patients with colorectal cancer: A study in Genomics Thailand Project.
Pathogenic germline variants among Thai patients with colorectal cancer: A study in Genomics Thailand Project.
e22534 Background: Germline variant testing in colorectal cancer (CRC) patients (pts) has been applied in many countries. Limited data is known regarding germline CRC prevalence i...
Computational Analysis of Missense Variants of G Protein-Coupled Receptors Involved in the Neuroendocrine Regulation of Reproduction
Computational Analysis of Missense Variants of G Protein-Coupled Receptors Involved in the Neuroendocrine Regulation of Reproduction
<b><i>Introduction:</i></b> Many missense variants in G protein-coupled receptors (GPCRs) involved in the neuroendocrine regulation of reproduction have bee...
SLC38A8 mutation spectrum in foveal hypoplasia
SLC38A8 mutation spectrum in foveal hypoplasia
AbstractPurposeSignificant phenotypic overlap exists between ocular albinism and SLC38A8 related foveal hypoplasia (FH) which hinders differential diagnosis. To facilitate molecula...
Edoxaban and Cancer-Associated Venous Thromboembolism: A Meta-analysis of Clinical Trials
Edoxaban and Cancer-Associated Venous Thromboembolism: A Meta-analysis of Clinical Trials
Abstract
Introduction
Cancer patients face a venous thromboembolism (VTE) risk that is up to 50 times higher compared to individuals without cancer. In 2010, direct oral anticoagul...
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Abstract
A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...


