Javascript must be enabled to continue!
In-silico read normalization using set multi-cover optimization
View through CrossRef
Abstract
De
Bruijn graphs are a common assembly data structure for large sequencing datasets. But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge. Read normalization, which removes redundancy in large datasets, is widely applied to reduce resource requirements. Current normalization algorithms, though efficient, provide no guarantee to preserve important
k
-mers that form connections between regions in the graph. Here, normalization is phrased as a
set multi-cover problem
on reads and a heuristic algorithm, ORNA, is proposed. ORNA normalizes to the minimum number of reads required to retain all
k
-mers and their relative
k
mer abundances from the original dataset. Hence, all connections and coverage information from the original graph are preserved. ORNA was tested on various RNA-seq datasets with different coverage values. It was compared to the current normalization algorithms and was found to be performing better. It is shown that combining read error correction and normalization allows more accurate and resource efficient RNA assemblies compared to the original dataset. Further, an application was proposed in which multiple datasets were combined and normalized to predict novel transcripts that would have been missed otherwise. Finally, ORNA is a general purpose normalization algorithm that is fast and significantly reduces datasets with little loss of assembly quality.
ORNA can be found under
https://github.com/SchulzLab/ORNA
Title: In-silico read normalization using set multi-cover optimization
Description:
Abstract
De
Bruijn graphs are a common assembly data structure for large sequencing datasets.
But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge.
Read normalization, which removes redundancy in large datasets, is widely applied to reduce resource requirements.
Current normalization algorithms, though efficient, provide no guarantee to preserve important
k
-mers that form connections between regions in the graph.
Here, normalization is phrased as a
set multi-cover problem
on reads and a heuristic algorithm, ORNA, is proposed.
ORNA normalizes to the minimum number of reads required to retain all
k
-mers and their relative
k
mer abundances from the original dataset.
Hence, all connections and coverage information from the original graph are preserved.
ORNA was tested on various RNA-seq datasets with different coverage values.
It was compared to the current normalization algorithms and was found to be performing better.
It is shown that combining read error correction and normalization allows more accurate and resource efficient RNA assemblies compared to the original dataset.
Further, an application was proposed in which multiple datasets were combined and normalized to predict novel transcripts that would have been missed otherwise.
Finally, ORNA is a general purpose normalization algorithm that is fast and significantly reduces datasets with little loss of assembly quality.
ORNA can be found under
https://github.
com/SchulzLab/ORNA.
Related Results
Data Normalization Methods of Hybridized Multi-Stage Feature Selection Classification for 5G Base Station Antenna Health Effect Detection
Data Normalization Methods of Hybridized Multi-Stage Feature Selection Classification for 5G Base Station Antenna Health Effect Detection
It is essential to assess human exposure to Fifth Generation (5G) Radiofrequency Electromagnetic Field (RF-EMF) signal from Base Station (BS) sources operating at Low Band 5G at 70...
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED]Keanu Reeves CBD Gummies ==❱❱ Huge Discounts:[HURRY UP ] Absolute Keanu Reeves CBD Gummies (Available)Order Online Only!! ❰❰= https://www.facebook.com/Keanu-Reeves-CBD-G...
Cover Crop Response to Late‐Season Planting and Nitrogen Application
Cover Crop Response to Late‐Season Planting and Nitrogen Application
Cover crops aid in reducing precipitation runoff, soil erosion, and N losses in highly sloped, mountainous regions. Corn (Zea mays L.) producers in states with late spring warmup a...
A Correspondence Between Normalization Strategies in Artificial and Biological Neural Networks
A Correspondence Between Normalization Strategies in Artificial and Biological Neural Networks
Abstract
A fundamental challenge at the interface of machine learning and neuroscience is to uncover computational principles that are shared bet...
A NEW MULTI-OBJECTIVE ARITHMETIC OPTIMIZATION ALGORITHM
A NEW MULTI-OBJECTIVE ARITHMETIC OPTIMIZATION ALGORITHM
Today, as engineering problems become more complex in terms of the effective variables in these problems and the range of their changes and their multidimensionality (in terms of n...
In-silico read normalization using Set Multi-Cover optimization
In-silico read normalization using Set Multi-Cover optimization
De Bruijn graph is a common assembly data structure. But, with the advances in deep sequencing technologies, assembling high coverage datasets has become a computational challenge....
MAFFIN: Metabolomics Sample Normalization Using Maximal Density Fold Change with High-Quality Metabolic Features and Corrected Signal Intensities
MAFFIN: Metabolomics Sample Normalization Using Maximal Density Fold Change with High-Quality Metabolic Features and Corrected Signal Intensities
Abstract
Sample normalization is a critical step in metabolomics to remove differences in total sample amount or concentration of metabolites between biological sam...

