Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

View through CrossRef
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.
Title: BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
Description:
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments.
Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases.
In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments.
This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application.
Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information.
We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow.
We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database.
Some of these queries are available as a pre-built feature of the BioWorkbench web application.
Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time.
We also show how the application of machine learning techniques can enrich the analysis process.

Related Results

Advancements in Biomedical and Bioinformatics Engineering
Advancements in Biomedical and Bioinformatics Engineering
Abstract: The field of biomedical and bioinformatics engineering is witnessing rapid advancements that are revolutionizing healthcare and medical research. This chapter provides a...
A large-scale analysis of bioinformatics code on GitHub
A large-scale analysis of bioinformatics code on GitHub
AbstractIn recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of sof...
Bioinformatics tool and web server development focusing on structural bioinformatics applications
Bioinformatics tool and web server development focusing on structural bioinformatics applications
This thesis is divided into two main sections: Part 1 describes the design, and evaluation of the accuracy of a new web server – PRotein Interactive MOdeling (PRIMO-Complexes) for ...
Cloud Computing in Bioinformatics: current solutions and challenges
Cloud Computing in Bioinformatics: current solutions and challenges
Abstract truncated at 3,000 characters - the full version is available in the pdf file MOTIVATIONS The availability of high-throughput technologies and the application of g...
The Role and Progress of Bioinformatics in Genomics Research
The Role and Progress of Bioinformatics in Genomics Research
With the rapid development of high-throughput sequencing technology, genomics research has entered the era of big data. Bioinformatics, as a bridge connecting biology, computer sci...
CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications
CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications
Abstract Background The analysis of large and complex biological datasets in bioinformatics poses a significant challenge to achieving reproducible ...

Back to Top