Javascript must be enabled to continue!

Optimization algorithm for omic data subspace clustering

Subspace clustering identifies multiple feature subspaces embedded in a dataset together with the underlying sample clusters. When applied to omic data, subspace clustering is a challenging task, as additional problems have to be addressed: the curse of dimensionality, the imperfect data quality and cluster separation, the presence of multiple subspaces representative of divergent views of the dataset, and the lack of consensus on the best clustering method. First, we propose a computational method ( discover ) to perform subspace clustering on tabular high dimensional data by maximizing the internal clustering score (i.e. cluster compactness) of feature subspaces. Our algorithm can be used in both unsupervised and semi-supervised settings. Secondly, by applying our method to a large set of omic datasets (i.e. microarray, bulk RNA-seq, scRNA-seq), we show that the subspace corresponding to the provided ground truth annotations is rarely the most compact one, as assumed by the methods maximizing the internal quality of clusters. Our results highlight the difficulty of fully validating subspace clusters (justified by the lack of feature annotations). Tested on identifying the ground-truth subspace, our method compared favorably with competing techniques on all datasets. Finally, we propose a suite of techniques to interpret the clustering results biologically in the absence of annotations. We demonstrate that subspace clustering can provide biologically meaningful sample-wise and feature-wise information, typically missed by traditional methods. CCS Concepts: • Computing methodologies → Genetic algorithms ; Mixture models ; Cluster analysis . ACM Reference Format Madalina Ciortan and Matthieu Defrance. 2021. Optimization algorithm for omic data subspace clustering. 1, 1 (September 2021), 40 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

openRxiv

Madalina Ciortan Matthieu Defrance

2021

Title: Optimization algorithm for omic data subspace clustering

Description:

Subspace clustering identifies multiple feature subspaces embedded in a dataset together with the underlying sample clusters.

When applied to omic data, subspace clustering is a challenging task, as additional problems have to be addressed: the curse of dimensionality, the imperfect data quality and cluster separation, the presence of multiple subspaces representative of divergent views of the dataset, and the lack of consensus on the best clustering method.

First, we propose a computational method ( discover ) to perform subspace clustering on tabular high dimensional data by maximizing the internal clustering score (i.

cluster compactness) of feature subspaces.

Our algorithm can be used in both unsupervised and semi-supervised settings.

Secondly, by applying our method to a large set of omic datasets (i.

microarray, bulk RNA-seq, scRNA-seq), we show that the subspace corresponding to the provided ground truth annotations is rarely the most compact one, as assumed by the methods maximizing the internal quality of clusters.

Our results highlight the difficulty of fully validating subspace clusters (justified by the lack of feature annotations).

Tested on identifying the ground-truth subspace, our method compared favorably with competing techniques on all datasets.

Finally, we propose a suite of techniques to interpret the clustering results biologically in the absence of annotations.

We demonstrate that subspace clustering can provide biologically meaningful sample-wise and feature-wise information, typically missed by traditional methods.

CCS Concepts: • Computing methodologies → Genetic algorithms ; Mixture models ; Cluster analysis .

ACM Reference Format Madalina Ciortan and Matthieu Defrance.

2021.

Optimization algorithm for omic data subspace clustering.

1, 1 (September 2021), 40 pages.

https://doi.

org/10.

1145/nnnnnnn.

nnnnnnn.

Back

To partition transaction data values, clustering algorithms are used. To analyse the relationships between transactions, similarity measures are utilized. Similarity models based o...

The Kernel Rough K-Means Algorithm

Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...

On Subspace-recurrent Operators

In this article, subspace-recurrent operators are presented and it is showed that the set of subspace-transitive operators is a strict subset of the set of subspace-recurrent opera...

MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization

<p>Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, the density-based clustering algorithm faces three challenging ...

Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm

In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...

A Hybrid K-means Method based on Modified Rat Swarm Optimization Algorithm for Data Clustering

Abstract The original K-means clustering algorithm is prone to local optima and sensitive to the initial clustering center, which have a great impact on accuracy and stabil...

Hierarchical Sparse Subspace Clustering (HESSC): An Automatic Approach for Hyperspectral Image Analysis

Hyperspectral imaging techniques are becoming one of the most important tools to remotely acquire fine spectral information on different objects. However, hyperspectral images (HSI...

Intelligent clustering using moth flame optimizer for vehicular ad hoc networks

Vehicular ad hoc networks consist of access points for communication, transmission, and collecting information of nodes and environment for managing traffic loads. Clustering can b...

Email:
Password:

Email:

Optimization algorithm for omic data subspace clustering

Related Results