Javascript must be enabled to continue!
Efficient parallel implementation of the SHRiMP sequence alignment tool using MapReduce
View through CrossRef
With the advent of ultra high-throughput DNA sequencing technologies used in Next-Generation Sequencing (NGS) machines, we are facing a daunting new era in petabyte scale bioinformatics data. The enormous amounts of data produced by NGS machines lead to storage, scalability, and performance challenges. At the same time, cloud computing architectures are rapidly emerging as robust and economical solutions to high performance computing of all kinds. To date, these architectures have had limited impact on the sequence alignment problem, whereby sequence reads must be compared to a reference genome.
In this research, we present a methodology for efficient transformation of one of the recently developed NGS alignment tools, SHRiMP, into the cloud environment based on the MapReduce programming model. Critical to the function and performance of our methodology is the implementation of several techniques and mechanisms for facilitating the task of porting the SHRiMP sequence alignment tool into the cloud. These techniques and mechanisms allow the "cloudified" SHRiMP to run as a black box within the MapReduce model, without the need for building new parallel algorithms or recoding this tool from scratch.
The approach is based on the MapReduce parallel programming model, its open source implementation Hadoop, and its underlying distributed file system (HDFS). The deployment of the developed methodology utilizes the cloud infrastructure installed at Qatar University.
Experimental results demonstrate that multiplexing large-scale SHRiMP sequence alignment jobs in parallel using the MapReduce framework dramatically improves the performance when the user utilizes the resources provided by the cloud.
In conclusion, using cloud computing for NGS data analysis is a viable and efficient alternative to analyzing data on in-house compute clusters. The efficiency and flexibility of the cloud computing environments and the MapReduce programming model provide a powerful version of the SHRiMP sequence alignment tool with a considerable boost. Using this methodology, ordinary biologists can perform the computationally demanding sequence alignment tasks without the need to delve deep into server and database management, without the complexities and hassles of running jobs on grids and clusters, and without the need to modify the existing code in order to adapt it for parallel processing.
Hamad bin Khalifa University Press (HBKU Press)
Title: Efficient parallel implementation of the SHRiMP sequence alignment tool using MapReduce
Description:
With the advent of ultra high-throughput DNA sequencing technologies used in Next-Generation Sequencing (NGS) machines, we are facing a daunting new era in petabyte scale bioinformatics data.
The enormous amounts of data produced by NGS machines lead to storage, scalability, and performance challenges.
At the same time, cloud computing architectures are rapidly emerging as robust and economical solutions to high performance computing of all kinds.
To date, these architectures have had limited impact on the sequence alignment problem, whereby sequence reads must be compared to a reference genome.
In this research, we present a methodology for efficient transformation of one of the recently developed NGS alignment tools, SHRiMP, into the cloud environment based on the MapReduce programming model.
Critical to the function and performance of our methodology is the implementation of several techniques and mechanisms for facilitating the task of porting the SHRiMP sequence alignment tool into the cloud.
These techniques and mechanisms allow the "cloudified" SHRiMP to run as a black box within the MapReduce model, without the need for building new parallel algorithms or recoding this tool from scratch.
The approach is based on the MapReduce parallel programming model, its open source implementation Hadoop, and its underlying distributed file system (HDFS).
The deployment of the developed methodology utilizes the cloud infrastructure installed at Qatar University.
Experimental results demonstrate that multiplexing large-scale SHRiMP sequence alignment jobs in parallel using the MapReduce framework dramatically improves the performance when the user utilizes the resources provided by the cloud.
In conclusion, using cloud computing for NGS data analysis is a viable and efficient alternative to analyzing data on in-house compute clusters.
The efficiency and flexibility of the cloud computing environments and the MapReduce programming model provide a powerful version of the SHRiMP sequence alignment tool with a considerable boost.
Using this methodology, ordinary biologists can perform the computationally demanding sequence alignment tasks without the need to delve deep into server and database management, without the complexities and hassles of running jobs on grids and clusters, and without the need to modify the existing code in order to adapt it for parallel processing.
Related Results
Multi-constraint scheduling of MapReduce workloads
Multi-constraint scheduling of MapReduce workloads
In recent years there has been an extraordinary growth of large-scale data processing and related technologies in both, industry and academic communities. This trend is mostly driv...
Pengaruh Species Udang Terhadap Rendemen yang Dihasilkan HeadLess dan Peeled Tain On Effect of Species on Yield Produced on Head Less and Peeled Tain On Shrimp
Pengaruh Species Udang Terhadap Rendemen yang Dihasilkan HeadLess dan Peeled Tain On Effect of Species on Yield Produced on Head Less and Peeled Tain On Shrimp
Salah satu tahapan pengolahan udang adalah pemotongan kepala (deheading) yang berfungsi menghilangkan bagian cephalothorax. Cara potong kepala akan mempengaruhi rendemen udang he...
Optimizing data management for MapReduce applications on large-scale distributed infrastructures
Optimizing data management for MapReduce applications on large-scale distributed infrastructures
Optimisation de la gestion des données pour les applications MapReduce sur des infrastructures distribuées à grande échelle
Les applications data-intensive sont lar...
Polyculture of red seaweed (Gracilaria tenuistipitata) with different stocking densities of whiteleg shrimp (Litopenaeus vannamei): Effects on water qualityand shrimp performance
Polyculture of red seaweed (Gracilaria tenuistipitata) with different stocking densities of whiteleg shrimp (Litopenaeus vannamei): Effects on water qualityand shrimp performance
This study was conducted to determine the impact of coculturing red seaweed (Gracilaria tenustipitata) with different densities of whiteleg shrimp (Litopenaeus vannamei) on water q...
Influence of different processing methods on quality and shelf life of dried shrimp
Influence of different processing methods on quality and shelf life of dried shrimp
Marine shrimp fishery contributes around 55% to the small shrimp production Sri Lanka. In addition to the targeted large shrimp for the export market, small shrimp are marketed loc...
Shrimp farming systems in Hai Phong, Vietnam.
Shrimp farming systems in Hai Phong, Vietnam.
Abstract
Hai Phong province is one of the main shrimp culture areas in north Vietnam. Its climate is influenced by two monsoon regimes. The April-September southwest monsoo...
Improving MapReduce Performance on Clusters
Improving MapReduce Performance on Clusters
Amélioration des performances de MapReduce sur grappe de calcul
Beaucoup de disciplines scientifiques s'appuient désormais sur l'analyse et la fouille de masses gig...
Economic factors affecting Thailand’s frozen shrimp export volume to the United States and Japan
Economic factors affecting Thailand’s frozen shrimp export volume to the United States and Japan
A Study of Economic Factors Affecting Thailand’s Frozen Shrimp Export Volume to the United States and Japan which hypothesized that there are economic factors that affect the quant...

