Javascript must be enabled to continue!

XA4C: eXplainable representation learning via Autoencoders revealing Critical genes

ABSTRACT Machine Learning models have been frequently used in transcriptome analyses. Particularly, Representation Learning (RL), e.g., autoencoders, are effective in learning critical representations in noisy data. However, learned representations, e.g., the “latent variables” in an autoencoder, are difficult to interpret, not to mention prioritizing essential genes for functional follow-up. In contrast, in traditional analyses, one may identify important genes such as Differentially Expressed (DiffEx), Differentially Co-Expressed (DiffCoEx), and Hub genes. Intuitively, the complex gene-gene interactions may be beyond the capture of marginal effects (DiffEx) or correlations (DiffCoEx and Hub), indicating the need of powerful RL models. However, the lack of interpretability and individual target genes is an obstacle for RL’s broad use in practice. To facilitate interpretable analysis and gene-identification using RL, we propose “Critical genes”, defined as genes that contribute highly to learned representations (e.g., latent variables in an autoencoder). As a proof-of-concept, supported by eXplainable Artificial Intelligence (XAI), we implemented eXplainable Autoencoder for Critical genes (XA4C) that quantifies each gene’s contribution to latent variables, based on which Critical genes are prioritized. Applying XA4C to gene expression data in six cancers showed that Critical genes capture essential pathways underlying cancers. Remarkably, Critical genes has little overlap with Hub or DiffEx genes, however, has a higher enrichment in a comprehensive disease gene database (DisGeNET), evidencing its potential to disclose massive unknown biology . As an example, we discovered five Critical genes sitting in the center of Lysine degradation (hsa00310) pathway, displaying distinct interaction patterns in tumor and normal tissues. In conclusion, XA4C facilitates explainable analysis using RL and Critical genes discovered by explainable RL empowers the study of complex interactions. Author Summary We propose a gene expression data analysis tool, XA4C, which builds an eXplainable Autoencoder to reveal Critical genes. XA4C disentangles the black box of the neural network of an autoencoder by providing each gene’s contribution to the latent variables in the autoencoder. Next, a gene’s ability to contribute to the latent variables is used to define the importance of this gene, based on which XA4C prioritizes “Critical genes”. Notably, we discovered that Critical genes enjoy two properties: (1) Their overlap with traditional differentially expressed genes and hub genes are poor, suggesting that they indeed brought novel insights into transcriptome data that cannot be captured by traditional analysis. (2) The enrichment of Critical genes in a comprehensive disease gene database (DisGeNET) is higher than differentially expressed or hub genes, evidencing their strong relevance to disease pathology. Therefore, we conclude that XA4C can reveal an additional landscape of gene expression data.

openRxiv

Qing Li Yang Yu Pathum Kossinna Theodore Lun Wenyuan Liao Qingrun Zhang

2023

Title: XA4C: eXplainable representation learning via Autoencoders revealing Critical genes

Description:

ABSTRACT Machine Learning models have been frequently used in transcriptome analyses.

Particularly, Representation Learning (RL), e.

, autoencoders, are effective in learning critical representations in noisy data.

However, learned representations, e.

, the “latent variables” in an autoencoder, are difficult to interpret, not to mention prioritizing essential genes for functional follow-up.

In contrast, in traditional analyses, one may identify important genes such as Differentially Expressed (DiffEx), Differentially Co-Expressed (DiffCoEx), and Hub genes.

Intuitively, the complex gene-gene interactions may be beyond the capture of marginal effects (DiffEx) or correlations (DiffCoEx and Hub), indicating the need of powerful RL models.

However, the lack of interpretability and individual target genes is an obstacle for RL’s broad use in practice.

To facilitate interpretable analysis and gene-identification using RL, we propose “Critical genes”, defined as genes that contribute highly to learned representations (e.

, latent variables in an autoencoder).

As a proof-of-concept, supported by eXplainable Artificial Intelligence (XAI), we implemented eXplainable Autoencoder for Critical genes (XA4C) that quantifies each gene’s contribution to latent variables, based on which Critical genes are prioritized.

Applying XA4C to gene expression data in six cancers showed that Critical genes capture essential pathways underlying cancers.

Remarkably, Critical genes has little overlap with Hub or DiffEx genes, however, has a higher enrichment in a comprehensive disease gene database (DisGeNET), evidencing its potential to disclose massive unknown biology .

As an example, we discovered five Critical genes sitting in the center of Lysine degradation (hsa00310) pathway, displaying distinct interaction patterns in tumor and normal tissues.

In conclusion, XA4C facilitates explainable analysis using RL and Critical genes discovered by explainable RL empowers the study of complex interactions.

Author Summary We propose a gene expression data analysis tool, XA4C, which builds an eXplainable Autoencoder to reveal Critical genes.

XA4C disentangles the black box of the neural network of an autoencoder by providing each gene’s contribution to the latent variables in the autoencoder.

Next, a gene’s ability to contribute to the latent variables is used to define the importance of this gene, based on which XA4C prioritizes “Critical genes”.

Notably, we discovered that Critical genes enjoy two properties: (1) Their overlap with traditional differentially expressed genes and hub genes are poor, suggesting that they indeed brought novel insights into transcriptome data that cannot be captured by traditional analysis.

(2) The enrichment of Critical genes in a comprehensive disease gene database (DisGeNET) is higher than differentially expressed or hub genes, evidencing their strong relevance to disease pathology.

Therefore, we conclude that XA4C can reveal an additional landscape of gene expression data.

Back

The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

Landmark tracking in 4D ultrasound using generalized representation learning

Abstract Purpose In this study, we present and validate a novel concept for target tracking in 4D ultrasound. The key idea is to replace image patch sim...

Human-centric and Semantics-based Explainable Event Detection: A Survey

Abstract In recent years, there has been a surge in interest in artificial intelligent systems that can provide human-centric explanations for decisions or predictions. No ...

Natural representation of composite data with replicated autoencoders

ABSTRACT Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present...

Applying quantum autoencoders for time series anomaly detection

Abstract Anomaly detection is an important problem with applications in various domains such as fraud detection, pattern recognition, or medical diagnosis. Several algori...

Parametrization of Heliophysical Data Using Autoencoders

One of the most important steps in any AI/ML application is the pre-processing of the data. The objective of this step is to project the original data in a new basis, or in a new l...

Evaluating autoencoders for the dimensionality reduction of MRI-derived radiomics and classification of malignant brain tumors

Machine learning has immense potential to enhance diagnostic research in a wealth of medical applications. Advances in medical imaging have made machine learning applications in cl...

Molecular Analyses of Deletion of the Long Arm of Chromosome 20 in Myelodysplastic Syndromes

Abstract Abstract 3834 Del(20q), one of the common chromosome abnormalities in myeloid neoplasms, is observed in 5 to 10% of patients with myelodyspla...

Email:
Password:

Email:

XA4C: eXplainable representation learning via Autoencoders revealing Critical genes

Related Results