Javascript must be enabled to continue!
Natural representation of composite data with replicated autoencoders
View through CrossRef
ABSTRACT
Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the ‘local entropy’ rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably. Algorithmically, this is realized by training an interacting system of replicated autoencoders. We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge.
AUTHOR SUMMARY
Extracting compositional features from noisy data and identifying the corresponding generative models is a fundamental challenge across sciences. The composition of elementary features can have highly non-linear effects which makes them very hard to identify from experimental data.
In biology, for instance, one challenge is to identify the key steps or components of molecular and cellular processes. Representative examples are the modeling of protein sequences as the composition of patterns influenced by phylogeny or the identification of gene clusters in which the presence of specific genes depends on the evolutionary history of the cell.
Here we present an unsupervised machine learning technique for the analysis of compositional data which is based on entropic neural autoencoders. Our approach aims at finding deep autoencoders that are highly invariant with respect to perturbations in the inputs and in the parameters. The procedure is efficient to implement and we have validated it both on synthetic and protein sequence data, where it can be shown that the latent variables of the autoencoders are non trivially correlated with the true underlying generative processes. Our results suggests that the local entropy approach represents a general valuable tool for the extraction of compositional features in hard unsupervised learning problems.
Title: Natural representation of composite data with replicated autoencoders
Description:
ABSTRACT
Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features.
Here we present an unsupervised method based on autoencoders for inferring these basic features of data.
The main novelty in our approach is that the training is based on the optimization of the ‘local entropy’ rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably.
Algorithmically, this is realized by training an interacting system of replicated autoencoders.
We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge.
AUTHOR SUMMARY
Extracting compositional features from noisy data and identifying the corresponding generative models is a fundamental challenge across sciences.
The composition of elementary features can have highly non-linear effects which makes them very hard to identify from experimental data.
In biology, for instance, one challenge is to identify the key steps or components of molecular and cellular processes.
Representative examples are the modeling of protein sequences as the composition of patterns influenced by phylogeny or the identification of gene clusters in which the presence of specific genes depends on the evolutionary history of the cell.
Here we present an unsupervised machine learning technique for the analysis of compositional data which is based on entropic neural autoencoders.
Our approach aims at finding deep autoencoders that are highly invariant with respect to perturbations in the inputs and in the parameters.
The procedure is efficient to implement and we have validated it both on synthetic and protein sequence data, where it can be shown that the latent variables of the autoencoders are non trivially correlated with the true underlying generative processes.
Our results suggests that the local entropy approach represents a general valuable tool for the extraction of compositional features in hard unsupervised learning problems.
Related Results
NONLINEAR STATIC ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS
NONLINEAR STATIC ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS USING ANALYSIS OF COMPOSITE SHELLS
This paper presents the results of the geometric nonlinear analysis of composite shell subjected to static load by using an edge-based smoothed finite elements (ES) and the mixed i...
Landmark tracking in 4D ultrasound using generalized representation learning
Landmark tracking in 4D ultrasound using generalized representation learning
Abstract
Purpose
In this study, we present and validate a novel concept for target tracking in 4D ultrasound. The key idea is to replace image patch sim...
COMPOSITION SYMBOLS
COMPOSITION SYMBOLS
A thorough analysis of the possibilities of using existing mathematical symbols in composite geometry was carried out, and a conclusion was drawn about the need to create composite...
Parametrization of Heliophysical Data Using Autoencoders
Parametrization of Heliophysical Data Using Autoencoders
One of the most important steps in any AI/ML application is the
pre-processing of the data. The objective of this step is to project the
original data in a new basis, or in a new l...
Applying quantum autoencoders for time series anomaly detection
Applying quantum autoencoders for time series anomaly detection
Abstract
Anomaly detection is an important problem with applications in various domains such as fraud detection, pattern recognition, or medical diagnosis. Several algori...
Interfacial Adhesion in Fibre-Polymer Composites
Interfacial Adhesion in Fibre-Polymer Composites
<p>The mechanical performance of a fibre-polymer composite is largely determined by the strength of interfacial adhesion across the fibre-polymer phase boundary. Therefore, a...
Modal and stress behavioral for CFRP composite lifting lug
Modal and stress behavioral for CFRP composite lifting lug
Purpose
In the present study, a steel lifting lug is replaced with a composite (carbon fiber-reinforced epoxy [CFRP]) lifting lug made of a carbon/epoxy composite. The purpose of t...
Evaluating autoencoders for the dimensionality reduction of MRI-derived radiomics and classification of malignant brain tumors
Evaluating autoencoders for the dimensionality reduction of MRI-derived radiomics and classification of malignant brain tumors
Machine learning has immense potential to enhance diagnostic research in a wealth of medical applications. Advances in medical imaging have made machine learning applications in cl...

