Javascript must be enabled to continue!

RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor

AbstractBackgroundHeterogeneous omics data, increasingly collected through high-throughput technologies, can contain hidden answers to very important and still unsolved biomedical questions. Their integration and processing are crucial mostly for tertiary analysis of Next Generation Sequencing data, although suitable big data strategies still address mainly primary and secondary analysis. Hence, there is a pressing need for algorithms specifically designed to explore big omics datasets, capable of ensuring scalability and interoperability, possibly relying on high-performance computing infrastructures.ResultsWe propose RGMQL, a R/Bioconductor package conceived to provide a set of specialized functions to extract, combine, process and compare omics datasets and their metadata from different and differently localized sources. RGMQL is built over the GenoMetric Query Language (GMQL) data management and computational engine, and can leverage its open curated repository as well as its cloud-based resources, with the possibility of outsourcing computational tasks to GMQL remote services. Furthermore, it overcomes the limits of the GMQL declarative syntax, by guaranteeing a procedural approach in dealing with omics data within the R/Bioconductor environment. But mostly, it provides full interoperability with other packages of the R/Bioconductor framework and extensibility over the most used genomic data structures and processing functions.ConclusionsRGMQL is able to combine the query expressiveness and computational efficiency of GMQL with a complete processing flow in the R environment, being a fully integrated extension of the R/Bioconductor framework. Here we provide three fully reproducible example use cases of biological relevance that are particularly explanatory of its flexibility of use and interoperability with other R/Bioconductor packages. They show how RGMQL can easily scale up from local to parallel and cloud computing while it combines and analyzes heterogeneous omics data from local or remote datasets, both public and private, in a completely transparent way to the user.

Springer Science and Business Media LLC

Simone Pallotta Silvia Cascianelli Marco Masseroli

BMC Bioinformatics

2022

Title: RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor

Description:

AbstractBackgroundHeterogeneous omics data, increasingly collected through high-throughput technologies, can contain hidden answers to very important and still unsolved biomedical questions.

Their integration and processing are crucial mostly for tertiary analysis of Next Generation Sequencing data, although suitable big data strategies still address mainly primary and secondary analysis.

Hence, there is a pressing need for algorithms specifically designed to explore big omics datasets, capable of ensuring scalability and interoperability, possibly relying on high-performance computing infrastructures.

ResultsWe propose RGMQL, a R/Bioconductor package conceived to provide a set of specialized functions to extract, combine, process and compare omics datasets and their metadata from different and differently localized sources.

RGMQL is built over the GenoMetric Query Language (GMQL) data management and computational engine, and can leverage its open curated repository as well as its cloud-based resources, with the possibility of outsourcing computational tasks to GMQL remote services.

Furthermore, it overcomes the limits of the GMQL declarative syntax, by guaranteeing a procedural approach in dealing with omics data within the R/Bioconductor environment.

But mostly, it provides full interoperability with other packages of the R/Bioconductor framework and extensibility over the most used genomic data structures and processing functions.

ConclusionsRGMQL is able to combine the query expressiveness and computational efficiency of GMQL with a complete processing flow in the R environment, being a fully integrated extension of the R/Bioconductor framework.

Here we provide three fully reproducible example use cases of biological relevance that are particularly explanatory of its flexibility of use and interoperability with other R/Bioconductor packages.

They show how RGMQL can easily scale up from local to parallel and cloud computing while it combines and analyzes heterogeneous omics data from local or remote datasets, both public and private, in a completely transparent way to the user.

Back

Abstract Purpose The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data scie...

Why Pakistan Must Lead in Regional Multi-Omics Research for Precision Medicine

Precision medicine has emerged as one of the most transformative movements in global healthcare, shifting the clinical emphasis from generalized treatments to highly individualized...

Literature Review on Metadata Governance

The framework of metadata governance is a subset of the primary data governance framework implementation within an enterprise. Metadata management helps identify data provenance an...

FAIR Digital Objects in Official Statistics

Introduction*1 Statistical offices on national and international scale provide statistics on demography, labour, income, society, economy, environment and othe...

Ontomet

Proper description of data, or metadata, is important to facilitate data sharing among Geospatial Information Communities. To avoid the production of arbitrary metadata annotations...

Globally Findable Planetary Data: The Interdisciplinary TRR170-DB Repository

Introduction: The TRR170-DB data repository (https://planetary-data-portal.org/) manages the research data from the collaborative research center ‘Late Accretion onto Ter...

Large-scale Manual Curation and Harmonization of Metadata from Metagenomic and Cancer Genomic Repositories: Challenges and Solutions

Abstract Public omics repositories contain vast amounts of valuable data, but their metadata suffers from extreme heterogeneity, unstandardized t...

Metadata in the Digital Library

The range of metadata needed to run a digital library and preserve its collections in the long term is much more extensive and complicated than anything in its traditional counterp...

Email:
Password:

Email:

RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor

Related Results