Javascript must be enabled to continue!
Multi-omics network-based functional annotation of unknown Arabidopsis genes
View through CrossRef
SummaryUnraveling gene functions is pivotal to understand the signaling cascades controlling plant development and stress responses. Given that experimental profiling is costly and labor intensive, the need for high-confidence computational annotations is evident. In contrast to detailed gene-specific functional information, transcriptomics data is widely available in both model and crop species. Here, we developed a novel automated function prediction (AFP) algorithm, leveraging complementary information present in multiple expression datasets through the analysis of study-specific gene co-expression networks. Benchmarking the prediction performance on recently characterizedArabidopsis thalianagenes, we showed that our method outperforms state-of-the-art expression-based approaches. Next, we predicted biological process annotations for known (n=15,790) and unknown (n=11,865) genes inA. thalianaand validated our predictions using experimental protein-DNA and protein-protein interaction data (covering >220 thousand interactions in total), obtaining a set of high-confidence functional annotations. 5,054 (42.6%) unknown genes were assigned at least one validated annotation, and 3,408 (53.0%) genes with only computational annotations gained at least one novel validated function. These omics-supported functional annotations shed light on a variety of developmental processes and molecular responses, such as flower and root development, defense responses to fungi and bacteria, and phytohormone signaling, and help alleviate the knowledge gap of biological process annotations in Arabidopsis. An in-depth analysis of two context-specific networks, modeling seed development and response to water deprivation, shows how previously uncharacterized genes function within the respective networks. Moreover, our AFP approach can be applied in future studies to facilitate gene discovery for crop improvement.Significance statementFor the majority of plant genes, it is unknown in which processes they are involved. Using a multi-omics approach, leveraging transcriptome, protein-DNA and protein-protein interaction data, we functionally annotated 42.6% of unknownArabidopsis thalianagenes, providing insight into a variety of developmental processes and molecular responses, as well as a resource of annotations which can be explored by the community to facilitate future research.
Title: Multi-omics network-based functional annotation of unknown Arabidopsis genes
Description:
SummaryUnraveling gene functions is pivotal to understand the signaling cascades controlling plant development and stress responses.
Given that experimental profiling is costly and labor intensive, the need for high-confidence computational annotations is evident.
In contrast to detailed gene-specific functional information, transcriptomics data is widely available in both model and crop species.
Here, we developed a novel automated function prediction (AFP) algorithm, leveraging complementary information present in multiple expression datasets through the analysis of study-specific gene co-expression networks.
Benchmarking the prediction performance on recently characterizedArabidopsis thalianagenes, we showed that our method outperforms state-of-the-art expression-based approaches.
Next, we predicted biological process annotations for known (n=15,790) and unknown (n=11,865) genes inA.
thalianaand validated our predictions using experimental protein-DNA and protein-protein interaction data (covering >220 thousand interactions in total), obtaining a set of high-confidence functional annotations.
5,054 (42.
6%) unknown genes were assigned at least one validated annotation, and 3,408 (53.
0%) genes with only computational annotations gained at least one novel validated function.
These omics-supported functional annotations shed light on a variety of developmental processes and molecular responses, such as flower and root development, defense responses to fungi and bacteria, and phytohormone signaling, and help alleviate the knowledge gap of biological process annotations in Arabidopsis.
An in-depth analysis of two context-specific networks, modeling seed development and response to water deprivation, shows how previously uncharacterized genes function within the respective networks.
Moreover, our AFP approach can be applied in future studies to facilitate gene discovery for crop improvement.
Significance statementFor the majority of plant genes, it is unknown in which processes they are involved.
Using a multi-omics approach, leveraging transcriptome, protein-DNA and protein-protein interaction data, we functionally annotated 42.
6% of unknownArabidopsis thalianagenes, providing insight into a variety of developmental processes and molecular responses, as well as a resource of annotations which can be explored by the community to facilitate future research.
Related Results
Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer
Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer
Abstract
Background and objectives
Colorectal cancer (CRC) represents a heterogeneous malignancy that has concerned global burden of incidence and mortality. The tradition...
Enhanced Tolerance to Oxidative Stress in Transgenic Arabidopsis Plants Expressing Proteins of Unknown Function
Enhanced Tolerance to Oxidative Stress in Transgenic Arabidopsis Plants Expressing Proteins of Unknown Function
Abstract
Over one-quarter of all plant genes encode proteins of unknown function that can be further classified as proteins with obscure features (POFs), which lack ...
Multi-omics Data Integration by Generative Adversarial Network
Multi-omics Data Integration by Generative Adversarial Network
Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throu...
A benchmark study of deep learning-based multi-omics data fusion methods for cancer
A benchmark study of deep learning-based multi-omics data fusion methods for cancer
Abstract
Background
A fused method using a combination of multi-omics data enables a comprehensive study of complex biological processes and highlig...
Muon: multimodal omics analysis framework
Muon: multimodal omics analysis framework
AbstractAdvances in multi-omics technologies have led to an explosion of multimodal datasets to address questions ranging from basic biology to translation. While these rich data p...
Exploring the classification of cancer cell lines from multiple omic views
Exploring the classification of cancer cell lines from multiple omic views
Background
Cancer classification is of great importance to understanding its pathogenesis, making diagnosis and developing treatment. The accumulation of extensive o...
Integrated multi-omics analysis to investigate the pathogenesis of intrauterine adhesion
Integrated multi-omics analysis to investigate the pathogenesis of intrauterine adhesion
Abstract
Background
Intrauterine adhesion (IUA) represents a prevalent cause of infertility and reproductive dysfunction; however, the underlying molecular mechanisms contr...
Functional analysis of the Theobroma cacao NPR1 gene in arabidopsis
Functional analysis of the Theobroma cacao NPR1 gene in arabidopsis
Abstract
Background
The Arabidopsis thaliana NPR1 gene encodes a transcription coactivator (NPR1) that plays a major role in the mechanisms regul...

