Javascript must be enabled to continue!

How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads

AbstractNext-Generation Sequencing (NGS) data is widely utilised for various downstream applications in bioinformatics, and numerous techniques have been developed forPCR-deduplicationanderror-correctionto eliminate bias and errors introduced during the sequencing. This study first-time provides a joint overview of recent advances in PCR-deduplication and error-correction on short reads. In particular, we utilise UMI-based PCR-deduplication strategies and sequencing data to assess the performance of the solely-computational PCR-deduplication approaches and investigate how error correction affects the performance of PCR-deduplication. Our survey and comparative analysis reveal that the deduplicated reads generated by the solely-computational PCR-deduplication and error-correction methods exhibit substantial differences and divergence from the sets of reads obtained by the UMI-based deduplication methods. The existing solely-computational PCR-deduplication and error-correction tools can eliminate some errors but still leave hundreds of thousands of erroneous reads uncorrected. All the error-correction approaches raise thousands or more new sequences after correction which do not have any benefit to the PCR-deduplication process. Upon these discoveries, we offer practical suggestions to enhance the existing computational approaches for improving the quality of short-read sequencing data.

Cold Spring Harbor Laboratory

Pengyao Ping Tian Lan Shuquan Su Wei Liu Jinyan Li

2024

Title: How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads

Description:

This study first-time provides a joint overview of recent advances in PCR-deduplication and error-correction on short reads.

In particular, we utilise UMI-based PCR-deduplication strategies and sequencing data to assess the performance of the solely-computational PCR-deduplication approaches and investigate how error correction affects the performance of PCR-deduplication.

Our survey and comparative analysis reveal that the deduplicated reads generated by the solely-computational PCR-deduplication and error-correction methods exhibit substantial differences and divergence from the sets of reads obtained by the UMI-based deduplication methods.

The existing solely-computational PCR-deduplication and error-correction tools can eliminate some errors but still leave hundreds of thousands of erroneous reads uncorrected.

All the error-correction approaches raise thousands or more new sequences after correction which do not have any benefit to the PCR-deduplication process.

Upon these discoveries, we offer practical suggestions to enhance the existing computational approaches for improving the quality of short-read sequencing data.

Back

Data deduplication is necessary for making data smaller and preventing duplication when transferring it. It is often used in cloud computing to increase the amount of data that can...

Abstract 2113: A wild-type-blocking reference sequence enhances COLD-PCR and enables fast amplification and high enrichment of all types of low-prevalence unknown mutations

Abstract Background: Molecular profiling of somatic mutations in cancer often requires the identification of low-prevalence DNA mutations in an excess of wild-type (...

MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1

Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...

ADD-QIA: An Adaptive Data Deduplication Framework Based on Quantum Immune Algorithm

Abstract Cloud computing has become the backbone of modern data management, yet the exponential growth of unstructured data from IoT devices, virtual machines, and ...

VASD2OM: Virtual Auditing and Secure Deduplication with Dynamic Ownership Management in Cloud

In cloud repository amenities, deduplication technology is often utilized to minimize the volume and bandwidth by removing repetitious information and caching only a solitary dupli...

GraphK-LR: Enhancing Long-read Metagenomic Binning with Read-overlap Graphs Across Microbial Kingdoms

Abstract Background: Metagenomics, the study of genetic material from environmental samples, relies on binning - the process of grouping DNA sequences from the same organis...

Deep Learning Phase Error Correction for Cerebrovascular 4D Flow MRI

Abstract Background and Purpose Background phase errors in 4D Flow MRI may negatively impact blood flow quantification. In this study, we assessed their impact on cerebrov...

Exploring Medication Error Causality and Reporting: A Cross Sectional Survey of Hamad Medical Corporation Health Professionals

IntroductionMedication errors are a major global issue, adversely impacting patient safety and health outcomes. Promoting patient safety through minimizing medication errors is the...

Email:
Password:

Email:

How Error Correction Affects PCR Deduplication: A Survey Based on UMI Datasets of Short Reads

Related Results