Javascript must be enabled to continue!

How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads

Abstract Next-Generation Sequencing (NGS) data is widely utilised for various downstream applications in bioinformatics, and numerous techniques have been developed for PCR-deduplication and error-correction to eliminate bias and errors introduced during the sequencing. This study first-time provides a joint overview of recent advances in PCR-deduplication and error-correction on short reads. In particular, we utilise UMI-based PCR-deduplication strategies and sequencing data to assess the performance of the solely-computational PCR-deduplication approaches and investigate how error correction affects the performance of PCR-deduplication. Our survey and comparative analysis reveal that the deduplicated reads generated by the solely-computational PCR-deduplication and error-correction methods exhibit substantial differences and divergence from the sets of reads obtained by the UMI-based deduplication methods. The existing solely-computational PCR-deduplication and error-correction tools can eliminate some errors but still leave hundreds of thousands of erroneous reads uncorrected. All the error-correction approaches raise thousands or more new sequences after correction which do not have any benefit to the PCR-deduplication process. Upon these discoveries, we offer practical suggestions to enhance the existing computational approaches for improving the quality of short-read sequencing data.

openRxiv

Pengyao Ping Tian Lan Shuquan Su Wei Liu Jinyan Li

2024

Title: How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads

Description:

This study first-time provides a joint overview of recent advances in PCR-deduplication and error-correction on short reads.

In particular, we utilise UMI-based PCR-deduplication strategies and sequencing data to assess the performance of the solely-computational PCR-deduplication approaches and investigate how error correction affects the performance of PCR-deduplication.

Our survey and comparative analysis reveal that the deduplicated reads generated by the solely-computational PCR-deduplication and error-correction methods exhibit substantial differences and divergence from the sets of reads obtained by the UMI-based deduplication methods.

The existing solely-computational PCR-deduplication and error-correction tools can eliminate some errors but still leave hundreds of thousands of erroneous reads uncorrected.

All the error-correction approaches raise thousands or more new sequences after correction which do not have any benefit to the PCR-deduplication process.

Upon these discoveries, we offer practical suggestions to enhance the existing computational approaches for improving the quality of short-read sequencing data.

Back

Abstract Third generation sequencing technologies Pacific Biosciences and Oxford Nanopore Technologies were respectively made available in 2011 and 2014. In contras...

Ensuring Data Integrity And Security In Diverse Cloud Environments To Prevent Duplicacy.

Data deduplication is a valuable technique for compressing and minimizing data duplication during data transfers, especially in cloud environments. By eliminating redundant data, i...

Environmental Surveillance Protocols for Highly Pathogenic Avian Influenza (HPAI) v2

EnvironmentalSurveillance Protocols for Highly Pathogenic Avian Influenza (HPAI) This comprehensive protocol suite enables systematic environmental surveillance for avian influenza...

Hybrid correction of highly noisy Oxford Nanopore long reads using a variable-order de Bruijn graph

Abstract Motivation The recent rise of long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore allows...

UMI-Gen: a UMI-based reads simulator for variant calling evaluation in paired-end sequencing NGS libraries

Abstract Motivation With Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves a...

Storage Capacity Enhancement of SSD-based Image Deduplication

As Cyber Physical Systems (CPSs), notably autonomous vehicles, generate increasing volumes of image-based data, efficient storage solutions become paramount. Leveraging high-densit...

Treseder Lab Pyrosequencing Protocol v1

DNA Extraction Extract DNA from sample using the phenol/chloroform procedure or your kit of choice. We typically use the Mo Bio Power Soil DNA extraction kit for extracting DNA fro...

Abstract 2113: A wild-type-blocking reference sequence enhances COLD-PCR and enables fast amplification and high enrichment of all types of low-prevalence unknown mutations

Abstract Background: Molecular profiling of somatic mutations in cancer often requires the identification of low-prevalence DNA mutations in an excess of wild-type (...

Email:
Password:

Email:

How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads

Related Results