Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Building Large Updatable Colored de Bruijn Graphs via Merging

View through CrossRef
MOTIVATION: There exists several massive genomic and metagenomic data collection efforts, including GenomeTrakr and MetaSub, which are routinely updated with new data. To analyze such datasets, memory-efficient methods to construct and store the colored de Bruijn graph have been developed. Yet, a problem that has not been considered is constructing the colored de Bruijn graph in a scalable manner that allows new data to be added without reconstruction. This problem is important for large public datasets as scalability is needed but also the ability to update the construction is also needed. RESULTS: We create a method for constructing and updating the colored de Bruijn graph on a very-large dataset through partitioning the data into smaller subsets, building the colored de Bruijn graph using a FM-index based representation, and succinctly merging these representations to build a single graph. The last step, merging succinctly, is the algorithmic challenge which we solve in this paper. We refer to the resulting method as VariMerge. We validate our approach, and show it produces a three-fold reduction in working space when constructing a colored de Bruijn graph for 8,000 strains. Lastly, we compare VariMerge to other competing methods --- including Vari, Rainbowfish, Mantis, Bloom Filter Trie, the method by Almodaresi(2019) and Multi-BRWT --- and illustrate that VariMerge is the only method that is capable of building the colored de Bruijn graph for 16,000 strains in a manner that allows additional samples to be added. Competing methods either did not scale to this large of a dataset or cannot allow for additions without reconstruction. AVAILABILITY: Our software is available under GPLv3 at https://github.com/cosmo-team/cosmo/tree/VARI-merge.
Title: Building Large Updatable Colored de Bruijn Graphs via Merging
Description:
MOTIVATION: There exists several massive genomic and metagenomic data collection efforts, including GenomeTrakr and MetaSub, which are routinely updated with new data.
To analyze such datasets, memory-efficient methods to construct and store the colored de Bruijn graph have been developed.
Yet, a problem that has not been considered is constructing the colored de Bruijn graph in a scalable manner that allows new data to be added without reconstruction.
This problem is important for large public datasets as scalability is needed but also the ability to update the construction is also needed.
RESULTS: We create a method for constructing and updating the colored de Bruijn graph on a very-large dataset through partitioning the data into smaller subsets, building the colored de Bruijn graph using a FM-index based representation, and succinctly merging these representations to build a single graph.
The last step, merging succinctly, is the algorithmic challenge which we solve in this paper.
We refer to the resulting method as VariMerge.
We validate our approach, and show it produces a three-fold reduction in working space when constructing a colored de Bruijn graph for 8,000 strains.
Lastly, we compare VariMerge to other competing methods --- including Vari, Rainbowfish, Mantis, Bloom Filter Trie, the method by Almodaresi(2019) and Multi-BRWT --- and illustrate that VariMerge is the only method that is capable of building the colored de Bruijn graph for 16,000 strains in a manner that allows additional samples to be added.
Competing methods either did not scale to this large of a dataset or cannot allow for additions without reconstruction.
AVAILABILITY: Our software is available under GPLv3 at https://github.
com/cosmo-team/cosmo/tree/VARI-merge.

Related Results

Multi de Bruijn Sequences and the Cross-Join Method
Multi de Bruijn Sequences and the Cross-Join Method
We show a method to construct binary multi de Bruijn sequences using the cross-join method. We extend the proof given by Alhakim for ordinary de Bruijn sequences to the case of mul...
Der skal ikke lades sten på sten tilbage
Der skal ikke lades sten på sten tilbage
The Building by the Barbar TempleClose by the large temple at Barbar 1) lies a little tell, which was investigated in the spring of 1956. The tell was shown to cover a building of ...
De gevel – een intermediair element tussen buiten en binnen
De gevel – een intermediair element tussen buiten en binnen
This study is based on the fact that all people have a basic need for protection from other people (and animals) as well as from the elements (the exterior climate). People need a ...
Disentangled Long-Read De Bruijn Graphs via Optical Maps
Disentangled Long-Read De Bruijn Graphs via Optical Maps
Abstract Pacific Biosciences (PacBio), the main third generation sequencing technology can produce scalable, high-throughput, unprecedented sequencing results throu...
Computing the Energy of Certain Graphs based on Vertex Status
Computing the Energy of Certain Graphs based on Vertex Status
Background: The concept of Hückel molecular orbital theory is used to compute the graph energy numerically and graphically on the base of the status of a vertex. Objective: Our a...
Data Analytics on Graphs Part I: Graphs and Spectra on Graphs
Data Analytics on Graphs Part I: Graphs and Spectra on Graphs
The area of Data Analytics on graphs promises a paradigm shift, as we approach information processing of new classes of data which are typically acquired on irregular but structure...
Twilight graphs
Twilight graphs
AbstractThis paper deals primarily with countable, simple, connected graphs and the following two conditions which are trivially satisfied if the graphs are finite:(a) there is an ...
Eigenspectral Analysis of Pendant Vertex- and Pendant Edge-Weighted Graphs of Linear Chains, Cycles, and Stars
Eigenspectral Analysis of Pendant Vertex- and Pendant Edge-Weighted Graphs of Linear Chains, Cycles, and Stars
Abstract Three classes of pendent vertex- and pendant edge-weighted graphs of linear chains (class I), stars (class II), and cycles (class III) have been presented. ...

Back to Top