Javascript must be enabled to continue!
ANDES: a novel best-match approach for enhancing gene set analysis in embedding spaces
View through CrossRef
AbstractEmbedding methods have emerged as a valuable class of approaches for distilling essential information from complex high-dimensional data into more accessible lower-dimensional spaces. Applications of embedding methods to biological data have demonstrated that gene embeddings can effectively capture physical, structural, and functional relationships between genes. However, this utility has been primarily realized by using gene embeddings for downstream machine learning tasks. Much less has been done to examine the embeddings directly, especially analyses of gene sets in embedding spaces. Here, we propose ANDES, a novel best-match approach that can be used with existing gene embeddings to compare gene sets while reconciling gene set diversity. This intuitive method has important downstream implications for improving the utility of embedding spaces for various tasks. Specifically, we show how ANDES, when applied to different gene embeddings encoding protein-protein interactions, can be used as a novel overrepresentation-based and rank-based gene set enrichment analysis method that achieves state-of-the-art performance. Additionally, ANDES can use multi-organism joint gene embeddings to facilitate functional knowledge transfer across organisms, allowing for phenotype mapping across model systems. Our flexible, straightforward best-match methodology can be extended to other embedding spaces with diverse community structures between set elements.
Title: ANDES: a novel best-match approach for enhancing gene set analysis in embedding spaces
Description:
AbstractEmbedding methods have emerged as a valuable class of approaches for distilling essential information from complex high-dimensional data into more accessible lower-dimensional spaces.
Applications of embedding methods to biological data have demonstrated that gene embeddings can effectively capture physical, structural, and functional relationships between genes.
However, this utility has been primarily realized by using gene embeddings for downstream machine learning tasks.
Much less has been done to examine the embeddings directly, especially analyses of gene sets in embedding spaces.
Here, we propose ANDES, a novel best-match approach that can be used with existing gene embeddings to compare gene sets while reconciling gene set diversity.
This intuitive method has important downstream implications for improving the utility of embedding spaces for various tasks.
Specifically, we show how ANDES, when applied to different gene embeddings encoding protein-protein interactions, can be used as a novel overrepresentation-based and rank-based gene set enrichment analysis method that achieves state-of-the-art performance.
Additionally, ANDES can use multi-organism joint gene embeddings to facilitate functional knowledge transfer across organisms, allowing for phenotype mapping across model systems.
Our flexible, straightforward best-match methodology can be extended to other embedding spaces with diverse community structures between set elements.
Related Results
Blood Cross Matching Without Anti-Human Globulin (AHG) and Bovine Serum: A New Interest for an Old Idea
Blood Cross Matching Without Anti-Human Globulin (AHG) and Bovine Serum: A New Interest for an Old Idea
Abstract
Introduction
Transfusion medicine promotes the safety of blood transfusions by rigorously testing to eliminate risks of infection and hemolytic. The efficacy (to correct ...
Parameterized Strings: Algorithms and Applications
Parameterized Strings: Algorithms and Applications
The parameterized string (p-string), a generalization of the traditional string, is composed of constant and parameter symbols. A parameterized match (p-match) exists between two p...
Expression and polymorphism of genes in gallstones
Expression and polymorphism of genes in gallstones
ABSTRACT
Through the method of clinical case control study, to explore the expression and genetic polymorphism of KLF14 gene (rs4731702 and rs972283) and SR-B1 gene (rs...
Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
Introduction
Over the last decade, the demand for linking records about people across databases has increased in various domains. Privacy challenges associated with linking sensit...
An Efficient ZZW Construction Using Low-Density Generator-Matrix Embedding Techniques
An Efficient ZZW Construction Using Low-Density Generator-Matrix Embedding Techniques
A novel steganographic algorithm based on ZZW construction is proposed to improve the steganographic embedding efficiency. Low-density generator-matrix (LDGM) embedding is an effic...
Information-Theoretic Limits for Steganography in Multimedia
Information-Theoretic Limits for Steganography in Multimedia
<pre>Steganography in multimedia aims to embed secret data into an innocent multimedia cover object. The embedding introduces some distortion to the cover object and produces...
Lithospheric Structure of the Central Andes Forearc from Gravity Data Modeling: Implication for Plate Coupling
Lithospheric Structure of the Central Andes Forearc from Gravity Data Modeling: Implication for Plate Coupling
Abstract
Geodetic and seismological data indicates that the Central Andes subduction zone is highly coupled. To understand the plate locking mechanism within the Cen...
Effective Attributed Network Embedding with Information Behavior Extraction
Effective Attributed Network Embedding with Information Behavior Extraction
Abstract
Network embedding has shown its effectiveness in many tasks such as link prediction, node classification, and community detection. Most attributed network embeddin...

