Javascript must be enabled to continue!

BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.

PeerJ

Maria Luiza Mondelli Thiago Magalhães Guilherme Loss Michael Wilde Ian Foster Marta Mattoso Daniel Katz Helio Barbosa Ana Tereza R. de Vasconcelos Kary Ocaña Luiz M.R. Gadelha

PeerJ

2018

Title: BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

Description:

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments.

Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases.

In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments.

This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application.

Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information.

We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow.

We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database.

Some of these queries are available as a pre-built feature of the BioWorkbench web application.

Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time.

We also show how the application of machine learning techniques can enrich the analysis process.

Back

Abstract: The field of biomedical and bioinformatics engineering is witnessing rapid advancements that are revolutionizing healthcare and medical research. This chapter provides a...

A large-scale analysis of bioinformatics code on GitHub

AbstractIn recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of sof...

New classifications for quantum bioinformatics: Q-bioinformatics, QCt-bioinformatics, QCg-bioinformatics, and QCr-bioinformatics

Abstract Bioinformatics has revolutionized biology and medicine by using computational methods to analyze and interpret biological data. Quantum mechanics has recent...

Bioinformatics tool and web server development focusing on structural bioinformatics applications

This thesis is divided into two main sections: Part 1 describes the design, and evaluation of the accuracy of a new web server – PRotein Interactive MOdeling (PRIMO-Complexes) for ...

Cloud Computing in Bioinformatics: current solutions and challenges

Abstract truncated at 3,000 characters - the full version is available in the pdf file MOTIVATIONS The availability of high-throughput technologies and the application of g...

The Role and Progress of Bioinformatics in Genomics Research

With the rapid development of high-throughput sequencing technology, genomics research has entered the era of big data. Bioinformatics, as a bridge connecting biology, computer sci...

Leveraging bioinformatics to enhance multi-sensory environmental art design: Insights from molecular and cellular biomechanics and human experience

With the rapid development of bioinformation technology and its wide application in various fields, its combination with multi-sensory environmental art design provides new possibi...

CREDO: a friendly Customizable, REproducible, DOcker file generator for bioinformatics applications

Abstract Background The analysis of large and complex biological datasets in bioinformatics poses a significant challenge to achieving reproducible ...

Email:
Password:

Email:

BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

Related Results