Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Genomic sequence characteristics and the empiric accuracy of short-read sequencing

View through CrossRef
Abstract Background Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences, and sequencing bias, reduce the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. For the clonal pathogen Mycobacterium tuberculosis (Mtb), researchers frequently exclude 10.7% of the genome believed to be repetitive and prone to erroneous variant calls. To benchmark short-read variant calling, we used 36 diverse clinical Mtb isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically study the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias, and GC content. å Results Reference based Illumina variant calling had a recall ≥89.0% and precision ≥98.5% across parameters evaluated. The best balance between precision and recall was achieved by tuning the mapping quality (MQ) threshold, i.e. confidence of the read mapping (recall 85.8%, precision 99.1% at MQ ≥ 40). Masking repetitive sequence content is an alternative conservative approach to variant calling that maintains high precision (recall 70.2%, precision 99.6% at MQ≥40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52 of the 168 PE/PPE genes (34.5%). We present a refined list of low confidence regions and examine the largest sources of variant calling error. Conclusions Our improved approach to variant calling has broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems, and more generally for WGS applications in other organisms.
Title: Genomic sequence characteristics and the empiric accuracy of short-read sequencing
Description:
Abstract Background Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research.
Genetic divergence from the reference genome, repetitive sequences, and sequencing bias, reduce the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized.
For the clonal pathogen Mycobacterium tuberculosis (Mtb), researchers frequently exclude 10.
7% of the genome believed to be repetitive and prone to erroneous variant calls.
To benchmark short-read variant calling, we used 36 diverse clinical Mtb isolates dually sequenced with Illumina short-reads and PacBio long-reads.
We systematically study the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias, and GC content.
å Results Reference based Illumina variant calling had a recall ≥89.
0% and precision ≥98.
5% across parameters evaluated.
The best balance between precision and recall was achieved by tuning the mapping quality (MQ) threshold, i.
e.
confidence of the read mapping (recall 85.
8%, precision 99.
1% at MQ ≥ 40).
Masking repetitive sequence content is an alternative conservative approach to variant calling that maintains high precision (recall 70.
2%, precision 99.
6% at MQ≥40).
Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52 of the 168 PE/PPE genes (34.
5%).
We present a refined list of low confidence regions and examine the largest sources of variant calling error.
Conclusions Our improved approach to variant calling has broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems, and more generally for WGS applications in other organisms.

Related Results

Plant species-specific basecaller improves actual accuracy of nanopore sequencing
Plant species-specific basecaller improves actual accuracy of nanopore sequencing
Abstract Background Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequen...
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
Abstract Background Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modification...
LoRTIS Software Suite: Transposon mutant analysis using long-read sequencing
LoRTIS Software Suite: Transposon mutant analysis using long-read sequencing
Abstract To date transposon insertion sequencing (TIS) methodologies have used short-read nucleotide sequencing technology. However, short-read sequences are unlike...
Abstract 1360: Understanding genetic variation in cancer using targeted nanopore long read sequencing
Abstract 1360: Understanding genetic variation in cancer using targeted nanopore long read sequencing
Abstract Structural variations (SV), a hallmark of genomic instability in cancer can either activate oncogenes or inactivate tumor suppressor genes. SVs tend to be r...
Next Generation Sequencing Technologies and Their Applications
Next Generation Sequencing Technologies and Their Applications
Abstract The advances in next generation sequencing (NGS) technologies have tremendous impacts on the studies of structural and f...
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED]Keanu Reeves CBD Gummies ==❱❱ Huge Discounts:[HURRY UP ] Absolute Keanu Reeves CBD Gummies (Available)Order Online Only!! ❰❰= https://www.facebook.com/Keanu-Reeves-CBD-G...
Pipeline for species-resolved full-length16S rRNA amplicon nanopore sequencing analysis of low-complexity bacterial microbiota
Pipeline for species-resolved full-length16S rRNA amplicon nanopore sequencing analysis of low-complexity bacterial microbiota
Abstract 16S rRNA amplicon sequencing is a fundamental tool for characterizing prokaryotic microbial communities. While short-read 16S rRNA sequencing is a proven s...
Accuracy and computational efficiency of genomic selection with high-density SNP and whole-genome sequence data.
Accuracy and computational efficiency of genomic selection with high-density SNP and whole-genome sequence data.
Abstract The prediction of complex or quantitative traits from single nucleotide polymorphism (SNP) genotypes has transformed livestock and plant breeding, and is...

Back to Top