Javascript must be enabled to continue!
Negligible effects of read trimming on the accuracy of germline short variant calling in the human genome
View through CrossRef
Background Next generation sequencing (NGS) has become a standard tool in the molecular diagnostics of Mendelian disease, and the precision of such diagnostics is greatly affected by the accuracy of variant calling from sequencing data. Recently, we have comprehensively evaluated the performance of multiple variant calling pipelines. However, no systematic analysis of the effects of read trimming on variant discovery with modern variant calling software has yet been performed. Methods In this work, we systematically evaluated the effects of adapters on the performance of 8 variant calling and filtering methods using 14 standard reference Genome-in-a-Bottle (GIAB) samples. Variant calls were compared to the ground truth variant sets, and the effect of adapter trimming with different tools was assessed using major performance metrics (precision, recall, and F1 score). Results We show that adapter trimming has no effect on the accuracy of the best-performing variant callers (e.g., DeepVariant) on whole-genome sequencing (WGS) data. For whole-exome sequencing (WES) datasets subtle improvement of accuracy was observed in some of the samples. In high-coverage WES data (~200x mean coverage), adapter removal allowed for discovery of 2-4 additional true positive variants in only two out of seven datasets tested. Moreover, this effect was not dependent on the median insert size and proportion of adapter sequences in reads. Surprisingly, the effect of trimming on variant calling was reversed when moderate coverage (~80-100x) WES data was used. Finally, we show that some of the recently developed machine learning-based variant callers demonstrate greater dependence on the presence of adapters in reads. Conclusions Taken together, our results indicate that adapter removal is unnecessary when calling germline variants, but suggest that preprocessing methods should be carefully chosen when developing and using machine learning-based variant analysis methods.
Title: Negligible effects of read trimming on the accuracy of germline short variant calling in the human genome
Description:
Background Next generation sequencing (NGS) has become a standard tool in the molecular diagnostics of Mendelian disease, and the precision of such diagnostics is greatly affected by the accuracy of variant calling from sequencing data.
Recently, we have comprehensively evaluated the performance of multiple variant calling pipelines.
However, no systematic analysis of the effects of read trimming on variant discovery with modern variant calling software has yet been performed.
Methods In this work, we systematically evaluated the effects of adapters on the performance of 8 variant calling and filtering methods using 14 standard reference Genome-in-a-Bottle (GIAB) samples.
Variant calls were compared to the ground truth variant sets, and the effect of adapter trimming with different tools was assessed using major performance metrics (precision, recall, and F1 score).
Results We show that adapter trimming has no effect on the accuracy of the best-performing variant callers (e.
g.
, DeepVariant) on whole-genome sequencing (WGS) data.
For whole-exome sequencing (WES) datasets subtle improvement of accuracy was observed in some of the samples.
In high-coverage WES data (~200x mean coverage), adapter removal allowed for discovery of 2-4 additional true positive variants in only two out of seven datasets tested.
Moreover, this effect was not dependent on the median insert size and proportion of adapter sequences in reads.
Surprisingly, the effect of trimming on variant calling was reversed when moderate coverage (~80-100x) WES data was used.
Finally, we show that some of the recently developed machine learning-based variant callers demonstrate greater dependence on the presence of adapters in reads.
Conclusions Taken together, our results indicate that adapter removal is unnecessary when calling germline variants, but suggest that preprocessing methods should be carefully chosen when developing and using machine learning-based variant analysis methods.
Related Results
Negligible effects of read trimming on the accuracy of germline short variant calling in the human genome
Negligible effects of read trimming on the accuracy of germline short variant calling in the human genome
Next generation sequencing (NGS) has become a standard tool in the molecular diagnostics of Mendelian disease, and the precision of such diagnostics is greatly affected by the accu...
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED]Keanu Reeves CBD Gummies ==❱❱ Huge Discounts:[HURRY UP ] Absolute Keanu Reeves CBD Gummies (Available)Order Online Only!! ❰❰= https://www.facebook.com/Keanu-Reeves-CBD-G...
FGF independent MEK1/2 signalling is essential for male fetal germline development in mice
FGF independent MEK1/2 signalling is essential for male fetal germline development in mice
Abstract
Background
Germline development provides the founding cells for spermatogenesis and oogenesis in ...
Genomic sequence characteristics and the empiric accuracy of short-read sequencing
Genomic sequence characteristics and the empiric accuracy of short-read sequencing
Abstract
Background
Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research. Gene...
Study of Deformation and Fracture of High Strength Steel Sheet during Conventional and Robust Trimming by Conducting Partial Trimming Tests
Study of Deformation and Fracture of High Strength Steel Sheet during Conventional and Robust Trimming by Conducting Partial Trimming Tests
Abstract
High-strength steels are used in the automotive industry for weight reduction and improved vehicle crashworthiness. In this work, an instrumented trimming die equi...
Mechanism of Tripeptide Trimming by γ-Secretase
Mechanism of Tripeptide Trimming by γ-Secretase
Abstract
The membrane-embedded γ-secretase complex processively cleaves within the transmembrane domain of amyloid precursor protein (APP) to pro...
Leveraging cancer mutation data to predict the pathogenicity of germline missense variants
Leveraging cancer mutation data to predict the pathogenicity of germline missense variants
ABSTRACTInnovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identifie...
SeqPurge: highly-sensitive adapter trimming for paired-end short read data
SeqPurge: highly-sensitive adapter trimming for paired-end short read data
Trimming adapter sequences from short read data is a common preprocessing step in most DNA/RNA sequence analysis pipelines. For amplicon-based approaches, which are mostly used in ...

