Javascript must be enabled to continue!
Processing genome-wide association studies within a repository of heterogeneous genomic datasets
View through CrossRef
Abstract
Background
Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-nucleotide polymorphisms (SNPs) – in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions.
Results
To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems. We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view. To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits. Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki). The integration effort finally allows us to use these datasets within multi-sample processing queries that respond to important biological questions. These are then made usable for multi-omic studies together with, e.g., somatic and reference mutation data, genomic annotations, epigenetic signals.
Conclusions
As a result of the our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system. Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows.
Springer Science and Business Media LLC
Title: Processing genome-wide association studies within a repository of heterogeneous genomic datasets
Description:
Abstract
Background
Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-nucleotide polymorphisms (SNPs) – in different individuals that are associated with phenotypic traits.
Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions.
Results
To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems.
We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view.
To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits.
Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki).
The integration effort finally allows us to use these datasets within multi-sample processing queries that respond to important biological questions.
These are then made usable for multi-omic studies together with, e.
g.
, somatic and reference mutation data, genomic annotations, epigenetic signals.
Conclusions
As a result of the our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system.
Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows.
Related Results
Globally Findable Planetary Data: The Interdisciplinary TRR170-DB Repository
Globally Findable Planetary Data: The Interdisciplinary TRR170-DB Repository
Introduction: The TRR170-DB data repository (https://planetary-data-portal.org/) manages the research data from the collaborative research center ‘Late Accretion onto Ter...
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Abstract
A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...
Comparative genomics reveals insights into anuran genome size evolution
Comparative genomics reveals insights into anuran genome size evolution
Abstract
Background
Amphibians, particularly anurans, display an enormous variation in genome size. Due to the unavailability of whole genome datase...
Whole Genome Resequencing and 1000 Genomes Project
Whole Genome Resequencing and 1000 Genomes Project
Abstract
The recent advances in sequencing technologies have enabled the whole human genome to be sequenced within weeks. To date, several human...
Advances in the Biogeochemical Role of Microorganisms in High‐Level Waste Repository
Advances in the Biogeochemical Role of Microorganisms in High‐Level Waste Repository
ABSTRACT
The deep geological repository of high‐level radioactive waste (HLW) is widely recognized as a safe and effective long‐term management strategy, which hi...
Pest Management in Universiti Tun Hussein Onn Malaysia Natural History Repository
Pest Management in Universiti Tun Hussein Onn Malaysia Natural History Repository
Under the Malaysian Universities Act 1971, Higher Learning Institutions are encouraged to set up museums that could support teaching and learning at the institution. With that prov...
Accuracy and computational efficiency of genomic selection with high-density SNP and whole-genome sequence data.
Accuracy and computational efficiency of genomic selection with high-density SNP and whole-genome sequence data.
Abstract
The prediction of complex or quantitative traits from single nucleotide polymorphism (SNP) genotypes has transformed livestock and plant breeding, and is...
Analisis Maturity Level Domain Monitor and Evaluate (ME) Pada Sistem Institusional Repository (IR) Perpustakaan Perguruan Tinggi
Analisis Maturity Level Domain Monitor and Evaluate (ME) Pada Sistem Institusional Repository (IR) Perpustakaan Perguruan Tinggi
Abstrak
Perpustakaan Atma Luhur yang saat ini berfungsi sebagai sarana pendukung dan penyedia literatur ilmiah untuk pembelajaran, penelitian, dan pengabdian masyarakat, juga menja...

