Javascript must be enabled to continue!
OMD Curation Toolkit: a workflow for in-house curation of public omics datasets
View through CrossRef
Abstract
Background
Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing.
Results
Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources.
Conclusions
Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.
Springer Science and Business Media LLC
Title: OMD Curation Toolkit: a workflow for in-house curation of public omics datasets
Description:
Abstract
Background
Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets.
However, working with and especially curating public omics datasets remains challenging despite these efforts.
While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing.
Results
Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets.
This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects.
While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources.
Conclusions
Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data.
Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.
Related Results
1331-P: Correlation between Sleep Deduction and Neurobehavioral Deficits in the Mouse Offspring of Mother with Diabetes
1331-P: Correlation between Sleep Deduction and Neurobehavioral Deficits in the Mouse Offspring of Mother with Diabetes
Lines of human cohort studies reported increased risk of neurobehavioral abnormalities as well as sleep problems in offspring of mother with diabetes (OMD). Preclinical studies dem...
Clinicians’ Perspective On Oligometastatic Disease: A National Survey.
Clinicians’ Perspective On Oligometastatic Disease: A National Survey.
Abstract
Background:A clear definition of oligometastatic disease (OMD) does not exist. The number of metastases is the most used parameter to select patients for ablative ...
The Europlanet Evaluation Toolkit
The Europlanet Evaluation Toolkit
Evaluation can provide essential information in understanding the effectiveness and accessibility of outreach activities in engaging diverse communities.
In this presentation, we w...
Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer
Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer
Abstract
Background and objectives
Colorectal cancer (CRC) represents a heterogeneous malignancy that has concerned global burden of incidence and mortality. The tradition...
The Europlanet Evaluation Toolkit
The Europlanet Evaluation Toolkit
<div>
<p>In this presentation, we will give an overview of the Europlanet Evaluation Toolkit, a resource that aims to empower outreach providers and edu...
Multi-omics Data Integration by Generative Adversarial Network
Multi-omics Data Integration by Generative Adversarial Network
Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throu...
Response to Toshihide Tsuda, Yumiko Miyano and Eiji Yamamoto [1]
Response to Toshihide Tsuda, Yumiko Miyano and Eiji Yamamoto [1]
Abstract
Background
In August 2021, we published in Environmental Health a Toolkit for detecting misused epidemiological methods with the goal of pr...
Exploring the classification of cancer cell lines from multiple omic views
Exploring the classification of cancer cell lines from multiple omic views
Background
Cancer classification is of great importance to understanding its pathogenesis, making diagnosis and developing treatment. The accumulation of extensive o...


