Javascript must be enabled to continue!

Natural representation of composite data with replicated autoencoders

ABSTRACT Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the ‘local entropy’ rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably. Algorithmically, this is realized by training an interacting system of replicated autoencoders. We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge. AUTHOR SUMMARY Extracting compositional features from noisy data and identifying the corresponding generative models is a fundamental challenge across sciences. The composition of elementary features can have highly non-linear effects which makes them very hard to identify from experimental data. In biology, for instance, one challenge is to identify the key steps or components of molecular and cellular processes. Representative examples are the modeling of protein sequences as the composition of patterns influenced by phylogeny or the identification of gene clusters in which the presence of specific genes depends on the evolutionary history of the cell. Here we present an unsupervised machine learning technique for the analysis of compositional data which is based on entropic neural autoencoders. Our approach aims at finding deep autoencoders that are highly invariant with respect to perturbations in the inputs and in the parameters. The procedure is efficient to implement and we have validated it both on synthetic and protein sequence data, where it can be shown that the latent variables of the autoencoders are non trivially correlated with the true underlying generative processes. Our results suggests that the local entropy approach represents a general valuable tool for the extraction of compositional features in hard unsupervised learning problems.

openRxiv

Matteo Negri Davide Bergamini Carlo Baldassi Riccardo Zecchina Christoph Feinauer

2019

Title: Natural representation of composite data with replicated autoencoders

Description:

ABSTRACT Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features.

Here we present an unsupervised method based on autoencoders for inferring these basic features of data.

The main novelty in our approach is that the training is based on the optimization of the ‘local entropy’ rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably.

Algorithmically, this is realized by training an interacting system of replicated autoencoders.

We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge.

AUTHOR SUMMARY Extracting compositional features from noisy data and identifying the corresponding generative models is a fundamental challenge across sciences.

The composition of elementary features can have highly non-linear effects which makes them very hard to identify from experimental data.

In biology, for instance, one challenge is to identify the key steps or components of molecular and cellular processes.

Representative examples are the modeling of protein sequences as the composition of patterns influenced by phylogeny or the identification of gene clusters in which the presence of specific genes depends on the evolutionary history of the cell.

Here we present an unsupervised machine learning technique for the analysis of compositional data which is based on entropic neural autoencoders.

Our approach aims at finding deep autoencoders that are highly invariant with respect to perturbations in the inputs and in the parameters.

The procedure is efficient to implement and we have validated it both on synthetic and protein sequence data, where it can be shown that the latent variables of the autoencoders are non trivially correlated with the true underlying generative processes.

Our results suggests that the local entropy approach represents a general valuable tool for the extraction of compositional features in hard unsupervised learning problems.

Back

This paper presents the results of the geometric nonlinear analysis of composite shell subjected to static load by using an edge-based smoothed finite elements (ES) and the mixed i...

Landmark tracking in 4D ultrasound using generalized representation learning

Abstract Purpose In this study, we present and validate a novel concept for target tracking in 4D ultrasound. The key idea is to replace image patch sim...

COMPOSITION SYMBOLS

A thorough analysis of the possibilities of using existing mathematical symbols in composite geometry was carried out, and a conclusion was drawn about the need to create composite...

Parametrization of Heliophysical Data Using Autoencoders

One of the most important steps in any AI/ML application is the pre-processing of the data. The objective of this step is to project the original data in a new basis, or in a new l...

Applying quantum autoencoders for time series anomaly detection

Abstract Anomaly detection is an important problem with applications in various domains such as fraud detection, pattern recognition, or medical diagnosis. Several algori...

Interfacial Adhesion in Fibre-Polymer Composites

<p>The mechanical performance of a fibre-polymer composite is largely determined by the strength of interfacial adhesion across the fibre-polymer phase boundary. Therefore, a...

Modal and stress behavioral for CFRP composite lifting lug

Purpose In the present study, a steel lifting lug is replaced with a composite (carbon fiber-reinforced epoxy [CFRP]) lifting lug made of a carbon/epoxy composite. The purpose of t...

Evaluating autoencoders for the dimensionality reduction of MRI-derived radiomics and classification of malignant brain tumors

Machine learning has immense potential to enhance diagnostic research in a wealth of medical applications. Advances in medical imaging have made machine learning applications in cl...

Email:
Password:

Email:

Natural representation of composite data with replicated autoencoders

Related Results