Javascript must be enabled to continue!
Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants
View through CrossRef
Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed. We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar. We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset). The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ~1.5 million individuals. Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.6%) as likely pathogenic/pathogenic, 261 (37.8%) as uncertain significance, and 4 (0.6%) as likely benign/benign. The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.3 (95% confidence interval: 24.2–33.1, p < 0.001), compared with all other germline missense variants in the same 216 genes. Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset. There was high area under precision-recall curve values (0.847 and 0.829) and area under the receiver-operating characteristic curve values (0.821 and 0.774) for logistic regression and random forest models, respectively, when applied to the test dataset. With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders.
Public Library of Science (PLoS)
Title: Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants
Description:
Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants.
Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders.
The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed.
We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar.
We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset).
The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ~1.
5 million individuals.
Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.
6%) as likely pathogenic/pathogenic, 261 (37.
8%) as uncertain significance, and 4 (0.
6%) as likely benign/benign.
The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.
3 (95% confidence interval: 24.
2–33.
1, p < 0.
001), compared with all other germline missense variants in the same 216 genes.
Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset.
There was high area under precision-recall curve values (0.
847 and 0.
829) and area under the receiver-operating characteristic curve values (0.
821 and 0.
774) for logistic regression and random forest models, respectively, when applied to the test dataset.
With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders.
Related Results
Leveraging cancer mutation data to predict the pathogenicity of germline missense variants
Leveraging cancer mutation data to predict the pathogenicity of germline missense variants
ABSTRACTInnovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identifie...
Evaluation of BoostDM, a somatic variant prediction tool, for the interpretation of germline variants in hereditary cancer genes
Evaluation of BoostDM, a somatic variant prediction tool, for the interpretation of germline variants in hereditary cancer genes
Abstract
Classifying germline variants in hereditary cancer genes remains challenging and requires integrating diverse lines of evidence. Boo...
Marfan syndrome: genetic variant determinants of cardiovascular outcomes
Marfan syndrome: genetic variant determinants of cardiovascular outcomes
Abstract
Background
Marfan syndrome is a systemic connective tissue disorder caused by genetic variants in the fibrillin-1 (FBN1...
Distinct mutational landscapes when comparing germline and somatic cancer variants in forty tumor suppressor genes
Distinct mutational landscapes when comparing germline and somatic cancer variants in forty tumor suppressor genes
Abstract
Germline and somatic cancer variants in tumor suppressor genes (TSGs) share loss of function mechanisms with studies of a few genes (
...
Narrowing of the neonatal region in the FBN1 gene
Narrowing of the neonatal region in the FBN1 gene
Abstract
Background
Neonatal Marfan syndrome (MFS) is considered the most severe form of MFS and is characterized by early child...
FGF independent MEK1/2 signalling is essential for male fetal germline development in mice
FGF independent MEK1/2 signalling is essential for male fetal germline development in mice
Abstract
Background
Germline development provides the founding cells for spermatogenesis and oogenesis in ...
Abstract OI-1: OI-1 Decoding breast cancer predisposition genes
Abstract OI-1: OI-1 Decoding breast cancer predisposition genes
Abstract
Women with one or more first-degree female relatives with a history of breast cancer have a two-fold increased risk of developing breast cancer. This risk i...
Abstract ED02-04: Returning germline results to cancer patients
Abstract ED02-04: Returning germline results to cancer patients
Abstract
Patient-centered communication, defined as providing clear, understandable information with active patient participation, can be especially challenging in t...

