Javascript must be enabled to continue!
Alignment of High-Throughput Sequencing Data Inside In-Memory Databases
View through CrossRef
In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
Title: Alignment of High-Throughput Sequencing Data Inside In-Memory Databases
Description:
In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance.
Computer supported DNA analysis is still an intensive time-consuming task.
In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA).
We focus on read alignment as one of the first steps in DNA sequence analysis.
In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management.
To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation.
We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37.
Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform.
Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures.
Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
Related Results
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...
B-247 BLADE-R: streamlined RNA extraction for clinical diagnostics and high-throughput applications
B-247 BLADE-R: streamlined RNA extraction for clinical diagnostics and high-throughput applications
Abstract
Background
Efficient nucleic acid extraction and purification are crucial for cellular and molecular biology research, ...
Ontology Alignment Techniques
Ontology Alignment Techniques
Sometimes the use of a single ontology is not sufficient to cover different vocabularies for the same domain, and it becomes necessary to use several ontologies in order to encompa...
Converged RAN/MEC slicing in beyond 5G (B5G) networks
Converged RAN/MEC slicing in beyond 5G (B5G) networks
(English) The main objective of this thesis is to propose solutions for implementing dynamic RAN slicing and Functional Split (FS) along with MEC placements in 5G/B5G. In particula...
Effects of Waterlogging on Soybean Rhizosphere Microbial Community Profiled Using Illumina MiSeq, LoopSeq, and PacBio 16S rRNA Genes Sequences
Effects of Waterlogging on Soybean Rhizosphere Microbial Community Profiled Using Illumina MiSeq, LoopSeq, and PacBio 16S rRNA Genes Sequences
Abstract
Background: Waterlogging on the global environment has led to a significant decline in crop yields. However, the response of plant-associated microbes to waterlogg...
Influence of alignment uncertainty on homology and phylogenetic modeling
Influence of alignment uncertainty on homology and phylogenetic modeling
Most evolutionary analyses or structure modeling are based upon pre-estimated multiple sequence alignment (MSA) models. From a computational point of view, it is too complex to est...
THE ROLE OF NEXT-GENERATION SEQUENCING IN LUNG CANCER DIAGNOSIS
THE ROLE OF NEXT-GENERATION SEQUENCING IN LUNG CANCER DIAGNOSIS
Among all malignant neoplasms, lung cancer is the cause of death in approximately every fifth patient. Next-generation sequencing can solve the issue of not only diagnosis but also...
Next-generation sequencing with emphasis on Illumina and Ion torrent platforms.
Next-generation sequencing with emphasis on Illumina and Ion torrent platforms.
Abstract
Background: Next-generation sequencing is a type of deep sequencing. In comparison to the previously used Sanger's method, ...

