Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

XA4C: eXplainable representation learning via Autoencoders revealing Critical genes

View through CrossRef
ABSTRACT Machine Learning models have been frequently used in transcriptome analyses. Particularly, Representation Learning (RL), e.g., autoencoders, are effective in learning critical representations in noisy data. However, learned representations, e.g., the “latent variables” in an autoencoder, are difficult to interpret, not to mention prioritizing essential genes for functional follow-up. In contrast, in traditional analyses, one may identify important genes such as Differentially Expressed (DiffEx), Differentially Co-Expressed (DiffCoEx), and Hub genes. Intuitively, the complex gene-gene interactions may be beyond the capture of marginal effects (DiffEx) or correlations (DiffCoEx and Hub), indicating the need of powerful RL models. However, the lack of interpretability and individual target genes is an obstacle for RL’s broad use in practice. To facilitate interpretable analysis and gene-identification using RL, we propose “Critical genes”, defined as genes that contribute highly to learned representations (e.g., latent variables in an autoencoder). As a proof-of-concept, supported by eXplainable Artificial Intelligence (XAI), we implemented eXplainable Autoencoder for Critical genes (XA4C) that quantifies each gene’s contribution to latent variables, based on which Critical genes are prioritized. Applying XA4C to gene expression data in six cancers showed that Critical genes capture essential pathways underlying cancers. Remarkably, Critical genes has little overlap with Hub or DiffEx genes, however, has a higher enrichment in a comprehensive disease gene database (DisGeNET), evidencing its potential to disclose massive unknown biology . As an example, we discovered five Critical genes sitting in the center of Lysine degradation (hsa00310) pathway, displaying distinct interaction patterns in tumor and normal tissues. In conclusion, XA4C facilitates explainable analysis using RL and Critical genes discovered by explainable RL empowers the study of complex interactions. Author Summary We propose a gene expression data analysis tool, XA4C, which builds an eXplainable Autoencoder to reveal Critical genes. XA4C disentangles the black box of the neural network of an autoencoder by providing each gene’s contribution to the latent variables in the autoencoder. Next, a gene’s ability to contribute to the latent variables is used to define the importance of this gene, based on which XA4C prioritizes “Critical genes”. Notably, we discovered that Critical genes enjoy two properties: (1) Their overlap with traditional differentially expressed genes and hub genes are poor, suggesting that they indeed brought novel insights into transcriptome data that cannot be captured by traditional analysis. (2) The enrichment of Critical genes in a comprehensive disease gene database (DisGeNET) is higher than differentially expressed or hub genes, evidencing their strong relevance to disease pathology. Therefore, we conclude that XA4C can reveal an additional landscape of gene expression data.
Title: XA4C: eXplainable representation learning via Autoencoders revealing Critical genes
Description:
ABSTRACT Machine Learning models have been frequently used in transcriptome analyses.
Particularly, Representation Learning (RL), e.
g.
, autoencoders, are effective in learning critical representations in noisy data.
However, learned representations, e.
g.
, the “latent variables” in an autoencoder, are difficult to interpret, not to mention prioritizing essential genes for functional follow-up.
In contrast, in traditional analyses, one may identify important genes such as Differentially Expressed (DiffEx), Differentially Co-Expressed (DiffCoEx), and Hub genes.
Intuitively, the complex gene-gene interactions may be beyond the capture of marginal effects (DiffEx) or correlations (DiffCoEx and Hub), indicating the need of powerful RL models.
However, the lack of interpretability and individual target genes is an obstacle for RL’s broad use in practice.
To facilitate interpretable analysis and gene-identification using RL, we propose “Critical genes”, defined as genes that contribute highly to learned representations (e.
g.
, latent variables in an autoencoder).
As a proof-of-concept, supported by eXplainable Artificial Intelligence (XAI), we implemented eXplainable Autoencoder for Critical genes (XA4C) that quantifies each gene’s contribution to latent variables, based on which Critical genes are prioritized.
Applying XA4C to gene expression data in six cancers showed that Critical genes capture essential pathways underlying cancers.
Remarkably, Critical genes has little overlap with Hub or DiffEx genes, however, has a higher enrichment in a comprehensive disease gene database (DisGeNET), evidencing its potential to disclose massive unknown biology .
As an example, we discovered five Critical genes sitting in the center of Lysine degradation (hsa00310) pathway, displaying distinct interaction patterns in tumor and normal tissues.
In conclusion, XA4C facilitates explainable analysis using RL and Critical genes discovered by explainable RL empowers the study of complex interactions.
Author Summary We propose a gene expression data analysis tool, XA4C, which builds an eXplainable Autoencoder to reveal Critical genes.
XA4C disentangles the black box of the neural network of an autoencoder by providing each gene’s contribution to the latent variables in the autoencoder.
Next, a gene’s ability to contribute to the latent variables is used to define the importance of this gene, based on which XA4C prioritizes “Critical genes”.
Notably, we discovered that Critical genes enjoy two properties: (1) Their overlap with traditional differentially expressed genes and hub genes are poor, suggesting that they indeed brought novel insights into transcriptome data that cannot be captured by traditional analysis.
(2) The enrichment of Critical genes in a comprehensive disease gene database (DisGeNET) is higher than differentially expressed or hub genes, evidencing their strong relevance to disease pathology.
Therefore, we conclude that XA4C can reveal an additional landscape of gene expression data.

Related Results

CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Landmark tracking in 4D ultrasound using generalized representation learning
Landmark tracking in 4D ultrasound using generalized representation learning
Abstract Purpose In this study, we present and validate a novel concept for target tracking in 4D ultrasound. The key idea is to replace image patch sim...
Human-centric and Semantics-based Explainable Event Detection: A Survey
Human-centric and Semantics-based Explainable Event Detection: A Survey
Abstract In recent years, there has been a surge in interest in artificial intelligent systems that can provide human-centric explanations for decisions or predictions. No ...
Natural representation of composite data with replicated autoencoders
Natural representation of composite data with replicated autoencoders
ABSTRACT Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present...
Applying quantum autoencoders for time series anomaly detection
Applying quantum autoencoders for time series anomaly detection
Abstract Anomaly detection is an important problem with applications in various domains such as fraud detection, pattern recognition, or medical diagnosis. Several algori...
Parametrization of Heliophysical Data Using Autoencoders
Parametrization of Heliophysical Data Using Autoencoders
One of the most important steps in any AI/ML application is the pre-processing of the data. The objective of this step is to project the original data in a new basis, or in a new l...
Evaluating autoencoders for the dimensionality reduction of MRI-derived radiomics and classification of malignant brain tumors
Evaluating autoencoders for the dimensionality reduction of MRI-derived radiomics and classification of malignant brain tumors
Machine learning has immense potential to enhance diagnostic research in a wealth of medical applications. Advances in medical imaging have made machine learning applications in cl...
Molecular Analyses of Deletion of the Long Arm of Chromosome 20 in Myelodysplastic Syndromes
Molecular Analyses of Deletion of the Long Arm of Chromosome 20 in Myelodysplastic Syndromes
Abstract Abstract 3834 Del(20q), one of the common chromosome abnormalities in myeloid neoplasms, is observed in 5 to 10% of patients with myelodyspla...

Back to Top