Javascript must be enabled to continue!

Hybrid correction of highly noisy Oxford Nanopore long reads using a variable-order de Bruijn graph

Abstract Motivation The recent rise of long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore allows to solve assembly problems for larger and more complex genomes than what allowed short reads technologies. However, these long reads are very noisy, reaching an error rate of around 10 to 15% for Pacific Biosciences, and up to 30% for Oxford Nanopore. The error correction problem has been tackled by either self-correcting the long reads, or using complementary short reads in a hybrid approach, but most methods only focus on Pacific Biosciences data, and do not apply to Oxford Nanopore reads. Moreover, even though recent chemistries from Oxford Nanopore promise to lower the error rate below 15%, it is still higher in practice, and correcting such noisy long reads remains an issue. Results We present HG-CoLoR, a hybrid error correction method that focuses on a seed-and-extend approach based on the alignment of the short reads to the long reads, followed by the traversal of a variable-order de Bruijn graph, built from the short reads. Our experiments show that HG-CoLoR manages to efficiently correct Oxford Nanopore long reads that display an error rate as high as 44%. When compared to other state-of-the-art long read error correction methods able to deal with Oxford Nanopore data, our experiments also show that HG-CoLoR provides the best trade-off between runtime and quality of the results, and is the only method able to efficiently scale to eukaryotic genomes. Availability and implementation HG-CoLoR is implemented is C++, supported on Linux platforms and freely available at https://github.com/morispi/HG-CoLoR Contact : pierre.morisse2@univ-rouen.fr Supplementary information Supplementary data are available at Bioinformatics online.

openRxiv

Pierre Morisse Thierry Lecroq Arnaud Lefebvre

2017

Title: Hybrid correction of highly noisy Oxford Nanopore long reads using a variable-order de Bruijn graph

Description:

However, these long reads are very noisy, reaching an error rate of around 10 to 15% for Pacific Biosciences, and up to 30% for Oxford Nanopore.

The error correction problem has been tackled by either self-correcting the long reads, or using complementary short reads in a hybrid approach, but most methods only focus on Pacific Biosciences data, and do not apply to Oxford Nanopore reads.

Moreover, even though recent chemistries from Oxford Nanopore promise to lower the error rate below 15%, it is still higher in practice, and correcting such noisy long reads remains an issue.

Results We present HG-CoLoR, a hybrid error correction method that focuses on a seed-and-extend approach based on the alignment of the short reads to the long reads, followed by the traversal of a variable-order de Bruijn graph, built from the short reads.

Our experiments show that HG-CoLoR manages to efficiently correct Oxford Nanopore long reads that display an error rate as high as 44%.

When compared to other state-of-the-art long read error correction methods able to deal with Oxford Nanopore data, our experiments also show that HG-CoLoR provides the best trade-off between runtime and quality of the results, and is the only method able to efficiently scale to eukaryotic genomes.

Availability and implementation HG-CoLoR is implemented is C++, supported on Linux platforms and freely available at https://github.

com/morispi/HG-CoLoR Contact : pierre.

morisse2@univ-rouen.

fr Supplementary information Supplementary data are available at Bioinformatics online.

Back

Abstract Third generation sequencing technologies Pacific Biosciences and Oxford Nanopore Technologies were respectively made available in 2011 and 2014. In contras...

MBG: Minimizer-based Sparse de Bruijn Graph Construction

Motivation De Bruijn graphs can be constructed from short reads efficiently and have been used for many purposes. Traditionally long read sequencing technologies ...

Building Large Updatable Colored de Bruijn Graphs via Merging

MOTIVATION: There exists several massive genomic and metagenomic data collection efforts, including GenomeTrakr and MetaSub, which are routinely updated with new data. To analyze s...

Disentangled Long-Read De Bruijn Graphs via Optical Maps

Abstract Pacific Biosciences (PacBio), the main third generation sequencing technology can produce scalable, high-throughput, unprecedented sequencing results throu...

Multi de Bruijn Sequences and the Cross-Join Method

We show a method to construct binary multi de Bruijn sequences using the cross-join method. We extend the proof given by Alhakim for ordinary de Bruijn sequences to the case of mul...

Buffering Updates Enables Efficient Dynamic de Bruijn Graphs

Abstract Motivation The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduc...

Phased Multi de Bruijn Sequences

We introduce phased multi de Bruijn sequences, a generalization of de Bruijn sequences. A phased string is a string whose positions sequentially rotate through several alphabets; e...

Graph convolutional neural networks for 3D data analysis

(English) Deep Learning allows the extraction of complex features directly from raw input data, eliminating the need for hand-crafted features from the classical Machine Learning p...

Email:
Password:

Email:

Hybrid correction of highly noisy Oxford Nanopore long reads using a variable-order de Bruijn graph

Related Results