Javascript must be enabled to continue!
Hybrid correction of highly noisy Oxford Nanopore long reads using a variable-order de Bruijn graph
View through CrossRef
Abstract
Motivation
The recent rise of long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore allows to solve assembly problems for larger and more complex genomes than what allowed short reads technologies. However, these long reads are very noisy, reaching an error rate of around 10 to 15% for Pacific Biosciences, and up to 30% for Oxford Nanopore. The error correction problem has been tackled by either self-correcting the long reads, or using complementary short reads in a hybrid approach, but most methods only focus on Pacific Biosciences data, and do not apply to Oxford Nanopore reads. Moreover, even though recent chemistries from Oxford Nanopore promise to lower the error rate below 15%, it is still higher in practice, and correcting such noisy long reads remains an issue.
Results
We present HG-CoLoR, a hybrid error correction method that focuses on a seed-and-extend approach based on the alignment of the short reads to the long reads, followed by the traversal of a variable-order de Bruijn graph, built from the short reads. Our experiments show that HG-CoLoR manages to efficiently correct Oxford Nanopore long reads that display an error rate as high as 44%. When compared to other state-of-the-art long read error correction methods able to deal with Oxford Nanopore data, our experiments also show that HG-CoLoR provides the best trade-off between runtime and quality of the results, and is the only method able to efficiently scale to eukaryotic genomes.
Availability and implementation
HG-CoLoR is implemented is C++, supported on Linux platforms and freely available at
https://github.com/morispi/HG-CoLoR
Contact
:
pierre.morisse2@univ-rouen.fr
Supplementary information
Supplementary data are available at
Bioinformatics
online.
Title: Hybrid correction of highly noisy Oxford Nanopore long reads using a variable-order de Bruijn graph
Description:
Abstract
Motivation
The recent rise of long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore allows to solve assembly problems for larger and more complex genomes than what allowed short reads technologies.
However, these long reads are very noisy, reaching an error rate of around 10 to 15% for Pacific Biosciences, and up to 30% for Oxford Nanopore.
The error correction problem has been tackled by either self-correcting the long reads, or using complementary short reads in a hybrid approach, but most methods only focus on Pacific Biosciences data, and do not apply to Oxford Nanopore reads.
Moreover, even though recent chemistries from Oxford Nanopore promise to lower the error rate below 15%, it is still higher in practice, and correcting such noisy long reads remains an issue.
Results
We present HG-CoLoR, a hybrid error correction method that focuses on a seed-and-extend approach based on the alignment of the short reads to the long reads, followed by the traversal of a variable-order de Bruijn graph, built from the short reads.
Our experiments show that HG-CoLoR manages to efficiently correct Oxford Nanopore long reads that display an error rate as high as 44%.
When compared to other state-of-the-art long read error correction methods able to deal with Oxford Nanopore data, our experiments also show that HG-CoLoR provides the best trade-off between runtime and quality of the results, and is the only method able to efficiently scale to eukaryotic genomes.
Availability and implementation
HG-CoLoR is implemented is C++, supported on Linux platforms and freely available at
https://github.
com/morispi/HG-CoLoR
Contact
:
pierre.
morisse2@univ-rouen.
fr
Supplementary information
Supplementary data are available at
Bioinformatics
online.
Related Results
Long-read error correction: a survey and qualitative comparison
Long-read error correction: a survey and qualitative comparison
Abstract
Third generation sequencing technologies Pacific Biosciences and Oxford Nanopore Technologies were respectively made available in 2011 and 2014. In contras...
MBG: Minimizer-based Sparse de Bruijn Graph Construction
MBG: Minimizer-based Sparse de Bruijn Graph Construction
Motivation
De Bruijn graphs can be constructed from short reads efficiently and have been used for many purposes. Traditionally long read sequencing technologies ...
Building Large Updatable Colored de Bruijn Graphs via Merging
Building Large Updatable Colored de Bruijn Graphs via Merging
MOTIVATION: There exists several massive genomic and metagenomic data collection efforts, including GenomeTrakr and MetaSub, which are routinely updated with new data. To analyze s...
Disentangled Long-Read De Bruijn Graphs via Optical Maps
Disentangled Long-Read De Bruijn Graphs via Optical Maps
Abstract
Pacific Biosciences (PacBio), the main third generation sequencing technology can produce scalable, high-throughput, unprecedented sequencing results throu...
Multi de Bruijn Sequences and the Cross-Join Method
Multi de Bruijn Sequences and the Cross-Join Method
We show a method to construct binary multi de Bruijn sequences using the cross-join method. We extend the proof given by Alhakim for ordinary de Bruijn sequences to the case of mul...
Buffering Updates Enables Efficient Dynamic de Bruijn Graphs
Buffering Updates Enables Efficient Dynamic de Bruijn Graphs
Abstract
Motivation
The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduc...
Phased Multi de Bruijn Sequences
Phased Multi de Bruijn Sequences
We introduce phased multi de Bruijn sequences, a generalization of de Bruijn sequences. A phased string is a string whose positions sequentially rotate through several alphabets; e...
Graph convolutional neural networks for 3D data analysis
Graph convolutional neural networks for 3D data analysis
(English) Deep Learning allows the extraction of complex features directly from raw input data, eliminating the need for hand-crafted features from the classical Machine Learning p...

