Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Adaptive Somatic Mutations Calls with Deep Learning and Semi-Simulated Data

View through CrossRef
ABSTRACTA number of approaches have been developed to call somatic variation in high-throughput sequencing data. Here, we present an adaptive approach to calling somatic variations. Our approach trains a deep feed-forward neural network with semi-simulated data. Semi-simulated datasets are constructed by planting somatic mutations in real datasets where no mutations are expected. Using semi-simulated data makes it possible to train the models with millions of training examples, a usual requirement for successfully training deep learning models. We initially focus on calling variations in RNA-Seq data. We derive semi-simulated datasets from real RNA-Seq data, which offer a good representation of the data the models will be applied to. We test the models on independent semi-simulated data as well as pure simulations. On independent semi-simulated data, models achieve an AUC of 0.973. When tested on semi-simulated exome DNA datasets, we find that the models trained on RNA-Seq data remain predictive (sens 0.4 & spec 0.9 at cutoff of P > = 0.9), albeit with lower overall performance (AUC=0.737). Interestingly, while the models generalize across assay, training on RNA-Seq data lowers the confidence for a group of mutations. Haloplex exome specific training was also performed, demonstrating that the approach can produce probabilistic models tuned for specific assays and protocols. We found that the method adapts to the characteristics of experimental protocol. We further illustrate these points by training a model for a trio somatic experimental design when germline DNA of both parents is available in addition to data about the individual. These models are distributed with Goby (http://goby.campagnelab.org).
Title: Adaptive Somatic Mutations Calls with Deep Learning and Semi-Simulated Data
Description:
ABSTRACTA number of approaches have been developed to call somatic variation in high-throughput sequencing data.
Here, we present an adaptive approach to calling somatic variations.
Our approach trains a deep feed-forward neural network with semi-simulated data.
Semi-simulated datasets are constructed by planting somatic mutations in real datasets where no mutations are expected.
Using semi-simulated data makes it possible to train the models with millions of training examples, a usual requirement for successfully training deep learning models.
We initially focus on calling variations in RNA-Seq data.
We derive semi-simulated datasets from real RNA-Seq data, which offer a good representation of the data the models will be applied to.
We test the models on independent semi-simulated data as well as pure simulations.
On independent semi-simulated data, models achieve an AUC of 0.
973.
When tested on semi-simulated exome DNA datasets, we find that the models trained on RNA-Seq data remain predictive (sens 0.
4 & spec 0.
9 at cutoff of P > = 0.
9), albeit with lower overall performance (AUC=0.
737).
Interestingly, while the models generalize across assay, training on RNA-Seq data lowers the confidence for a group of mutations.
Haloplex exome specific training was also performed, demonstrating that the approach can produce probabilistic models tuned for specific assays and protocols.
We found that the method adapts to the characteristics of experimental protocol.
We further illustrate these points by training a model for a trio somatic experimental design when germline DNA of both parents is available in addition to data about the individual.
These models are distributed with Goby (http://goby.
campagnelab.
org).

Related Results

Dynamics of Mutations in Patients with ET Treated with Imetelstat
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Abstract Background: Imetelstat, a first in class specific telomerase inhibitor, induced hematologic responses in all patients (pts) with essential thrombocythemia (...
Small Subclones Harboring NOTCH1, SF3B1 or BIRC3 Mutations Are Clinically Irrelevant in Chronic Lymphocytic Leukemia
Small Subclones Harboring NOTCH1, SF3B1 or BIRC3 Mutations Are Clinically Irrelevant in Chronic Lymphocytic Leukemia
Abstract Introduction. Ultra-deep next generation sequencing (NGS) allows sensitive detection of mutations and estimation of their clonal abundance in tumor cell pop...
Animal Alarm Calls
Animal Alarm Calls
Alarm calls are broadly defined as calls occurring in a predator context. Alarm calls have been the subject of intense scrutiny in animal communication research, as they are releva...
Distinct Profile of FLT3 Mutations in Brazil.
Distinct Profile of FLT3 Mutations in Brazil.
Abstract Mutations in the tyrosine kinase receptor FLT3 are the most common molecular abnormality in acute myeloid leukemia (AML) being detected in about 30% of AML ...
STAT3 Mutations in Large Granular Lymphocytic Leukemia
STAT3 Mutations in Large Granular Lymphocytic Leukemia
Abstract Abstract 1606 Introduction: Large granular lymphocytic leukemia (LGL leukemia) is a rare lymphoprolifera...
Nfkbiz 3′ UTR Mutations Confer Selective Growth Advantage and Affect Drug Response in Diffuse Large B-Cell Lymphoma
Nfkbiz 3′ UTR Mutations Confer Selective Growth Advantage and Affect Drug Response in Diffuse Large B-Cell Lymphoma
Introduction: The activated B-cell-like (ABC) molecular subgroup of diffuse large B-cell lymphoma (DLBCL) is characterized by activation of NF-κB signaling and increased mortality....
Deep convolutional neural network and IoT technology for healthcare
Deep convolutional neural network and IoT technology for healthcare
Background Deep Learning is an AI technology that trains computers to analyze data in an approach similar to the human brain. Deep learning algorithms can find complex patterns in ...

Back to Top