Javascript must be enabled to continue!
UMI-Gen: a UMI-based reads simulator for variant calling evaluation in paired-end sequencing NGS libraries
View through CrossRef
Abstract
Motivation
With Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves as the fastest and most reliable way to detect Single Nucleotide Variants (SNV) and Copy Number Variations (CNV) in cancer patients. These technologies can be used to sequence DNA at very high depths thus allowing to detect abnormalities in tumor cells with very low frequencies. A lot of different variant callers are publicly available and usually do a good job at calling out variants. However, when frequencies begin to drop under 1%, the specificity of these tools suffers greatly as true variants at very low frequencies can be easily confused with sequencing or PCR artifacts. The recent use of Unique Molecular Identifiers (UMI) in NGS experiments offered a way to accurately separate true variants from artifacts. UMI-based variant callers are slowly replacing raw-reads based variant callers as the standard method for an accurate detection of variants at very low frequencies. However, benchmarking done in the tools publication are usually realized on real biological data in which real variants are not known, making it difficult to assess their accuracy.
Results
We present UMI-Gen, a UMI-based reads simulator for targeted sequencing paired-end data. UMI-Gen generates reference reads covering the targeted regions at a user customizable depth. After that, using a number of control files, it estimates the background error rate at each position and then modifies the generated reads to mimic real biological data. Finally, it will insert real variants in the reads from a list provided by the user.
Availability
The entire pipeline is available at
https://gitlab.com/vincent-sater/umigen-master
under MIT license.
Contact
vincent.sater@gmail.com
Title: UMI-Gen: a UMI-based reads simulator for variant calling evaluation in paired-end sequencing NGS libraries
Description:
Abstract
Motivation
With Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves as the fastest and most reliable way to detect Single Nucleotide Variants (SNV) and Copy Number Variations (CNV) in cancer patients.
These technologies can be used to sequence DNA at very high depths thus allowing to detect abnormalities in tumor cells with very low frequencies.
A lot of different variant callers are publicly available and usually do a good job at calling out variants.
However, when frequencies begin to drop under 1%, the specificity of these tools suffers greatly as true variants at very low frequencies can be easily confused with sequencing or PCR artifacts.
The recent use of Unique Molecular Identifiers (UMI) in NGS experiments offered a way to accurately separate true variants from artifacts.
UMI-based variant callers are slowly replacing raw-reads based variant callers as the standard method for an accurate detection of variants at very low frequencies.
However, benchmarking done in the tools publication are usually realized on real biological data in which real variants are not known, making it difficult to assess their accuracy.
Results
We present UMI-Gen, a UMI-based reads simulator for targeted sequencing paired-end data.
UMI-Gen generates reference reads covering the targeted regions at a user customizable depth.
After that, using a number of control files, it estimates the background error rate at each position and then modifies the generated reads to mimic real biological data.
Finally, it will insert real variants in the reads from a list provided by the user.
Availability
The entire pipeline is available at
https://gitlab.
com/vincent-sater/umigen-master
under MIT license.
Contact
vincent.
sater@gmail.
com.
Related Results
Estrategias de transmisión de digeneos parásitos que utilizan gasterópodos como hospedadores intermediarios en la costa patagónica
Estrategias de transmisión de digeneos parásitos que utilizan gasterópodos como hospedadores intermediarios en la costa patagónica
En el siguiente trabajo se describieron los estadios larvales de los digeneos parásitos presentes en los gasterópodos más abundantes de los intermareales de Puerto Madryn, Chubut (...
Next-generation sequencing with emphasis on Illumina and Ion torrent platforms.
Next-generation sequencing with emphasis on Illumina and Ion torrent platforms.
Abstract
Background: Next-generation sequencing is a type of deep sequencing. In comparison to the previously used Sanger's method, ...
Comparison of Three Assays for Identification of IDH Mutations in AML
Comparison of Three Assays for Identification of IDH Mutations in AML
Introduction
Isocitrate dehydrogenase (IDH) mutations are present in up to 20% of acute myeloid leukemia (AML) patients and lead to production of 2-hydroxyglutarate ...
Next Generation Sequencing Technologies and Their Applications
Next Generation Sequencing Technologies and Their Applications
Abstract
The advances in next generation sequencing (NGS) technologies have tremendous impacts on the studies of structural and f...
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract
Introduction
Variant calling based on DNA samples has been the gold standard of clinical testing since the advent of Sanger sequencing. The u...
Negligible effects of read trimming on the accuracy of germline short variant calling in the human genome
Negligible effects of read trimming on the accuracy of germline short variant calling in the human genome
Background Next generation sequencing (NGS) has become a standard tool in the molecular diagnostics of Mendelian disease, and the precision of such diagnostics is greatly affected ...
Next Generation Amplicon Sequencing of Immunoglobulin Heavy Chain Gene Rearrangaments for Minimal Residual Disease (MRD) Stratification in Childhood Acute Lymphoblastic Leukemia (ALL): A Comparison with Classical qPCR-Based Technique
Next Generation Amplicon Sequencing of Immunoglobulin Heavy Chain Gene Rearrangaments for Minimal Residual Disease (MRD) Stratification in Childhood Acute Lymphoblastic Leukemia (ALL): A Comparison with Classical qPCR-Based Technique
Abstract
Introduction
MRD is an important predictor of outcome in childhood ALL. Since 2000, MRD detected by quantitative PCR (qPCR) f...
Small Subclones Harboring NOTCH1, SF3B1 or BIRC3 Mutations Are Clinically Irrelevant in Chronic Lymphocytic Leukemia
Small Subclones Harboring NOTCH1, SF3B1 or BIRC3 Mutations Are Clinically Irrelevant in Chronic Lymphocytic Leukemia
Abstract
Introduction. Ultra-deep next generation sequencing (NGS) allows sensitive detection of mutations and estimation of their clonal abundance in tumor cell pop...

