Javascript must be enabled to continue!
How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads
View through CrossRef
AbstractNext-Generation Sequencing (NGS) data is widely utilised for various downstream applications in bioinformatics, and numerous techniques have been developed forPCR-deduplicationanderror-correctionto eliminate bias and errors introduced during the sequencing. This study first-time provides a joint overview of recent advances in PCR-deduplication and error-correction on short reads. In particular, we utilise UMI-based PCR-deduplication strategies and sequencing data to assess the performance of the solely-computational PCR-deduplication approaches and investigate how error correction affects the performance of PCR-deduplication. Our survey and comparative analysis reveal that the deduplicated reads generated by the solely-computational PCR-deduplication and error-correction methods exhibit substantial differences and divergence from the sets of reads obtained by the UMI-based deduplication methods. The existing solely-computational PCR-deduplication and error-correction tools can eliminate some errors but still leave hundreds of thousands of erroneous reads uncorrected. All the error-correction approaches raise thousands or more new sequences after correction which do not have any benefit to the PCR-deduplication process. Upon these discoveries, we offer practical suggestions to enhance the existing computational approaches for improving the quality of short-read sequencing data.
Title: How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads
Description:
AbstractNext-Generation Sequencing (NGS) data is widely utilised for various downstream applications in bioinformatics, and numerous techniques have been developed forPCR-deduplicationanderror-correctionto eliminate bias and errors introduced during the sequencing.
This study first-time provides a joint overview of recent advances in PCR-deduplication and error-correction on short reads.
In particular, we utilise UMI-based PCR-deduplication strategies and sequencing data to assess the performance of the solely-computational PCR-deduplication approaches and investigate how error correction affects the performance of PCR-deduplication.
Our survey and comparative analysis reveal that the deduplicated reads generated by the solely-computational PCR-deduplication and error-correction methods exhibit substantial differences and divergence from the sets of reads obtained by the UMI-based deduplication methods.
The existing solely-computational PCR-deduplication and error-correction tools can eliminate some errors but still leave hundreds of thousands of erroneous reads uncorrected.
All the error-correction approaches raise thousands or more new sequences after correction which do not have any benefit to the PCR-deduplication process.
Upon these discoveries, we offer practical suggestions to enhance the existing computational approaches for improving the quality of short-read sequencing data.
Related Results
Long-read error correction: a survey and qualitative comparison
Long-read error correction: a survey and qualitative comparison
Abstract
Third generation sequencing technologies Pacific Biosciences and Oxford Nanopore Technologies were respectively made available in 2011 and 2014. In contras...
Ensuring Data Integrity And Security In Diverse Cloud Environments To Prevent Duplicacy.
Ensuring Data Integrity And Security In Diverse Cloud Environments To Prevent Duplicacy.
Data deduplication is a valuable technique for compressing and minimizing data duplication during data transfers, especially in cloud environments. By eliminating redundant data, i...
Environmental Surveillance Protocols for Highly Pathogenic Avian Influenza (HPAI) v2
Environmental Surveillance Protocols for Highly Pathogenic Avian Influenza (HPAI) v2
EnvironmentalSurveillance Protocols for Highly Pathogenic Avian Influenza (HPAI) This comprehensive protocol suite enables systematic environmental surveillance for avian influenza...
Hybrid correction of highly noisy Oxford Nanopore long reads using a variable-order de Bruijn graph
Hybrid correction of highly noisy Oxford Nanopore long reads using a variable-order de Bruijn graph
Abstract
Motivation
The recent rise of long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore allows...
Storage Capacity Enhancement of SSD-based Image Deduplication
Storage Capacity Enhancement of SSD-based Image Deduplication
As Cyber Physical Systems (CPSs), notably autonomous vehicles, generate increasing volumes of image-based data, efficient storage solutions become paramount. Leveraging high-densit...
UMI-Gen: a UMI-based reads simulator for variant calling evaluation in paired-end sequencing NGS libraries
UMI-Gen: a UMI-based reads simulator for variant calling evaluation in paired-end sequencing NGS libraries
Abstract
Motivation
With Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves a...
AVOIDANCE OF DUPLICACY AND COMPELLING CLOUD SECURITY INDIFFERENT CLOUD SITUATIONS
AVOIDANCE OF DUPLICACY AND COMPELLING CLOUD SECURITY INDIFFERENT CLOUD SITUATIONS
Data deduplication is necessary for making data smaller and preventing duplication when transferring it. It is often used in cloud computing to increase the amount of data that can...
Treseder Lab Pyrosequencing Protocol v1
Treseder Lab Pyrosequencing Protocol v1
DNA Extraction Extract DNA from sample using the phenol/chloroform procedure or your kit of choice. We typically use the Mo Bio Power Soil DNA extraction kit for extracting DNA fro...

