Javascript must be enabled to continue!

Optimization algorithm for omic data subspace clustering

Subspace clustering identifies multiple feature subspaces embedded in a dataset together with the underlying sample clusters. When applied to omic data, subspace clustering is a challenging task, as additional problems have to be addressed: the curse of dimensionality, the imperfect data quality and cluster separation, the presence of multiple subspaces representative of divergent views of the dataset, and the lack of consensus on the best clustering method.First, we propose a computational method (discover) to perform subspace clustering on tabular high dimensional data by maximizing the internal clustering score (i.e. cluster compactness) of feature subspaces. Our algorithm can be used in both unsupervised and semi-supervised settings. Secondly, by applying our method to a large set of omic datasets (i.e. microarray, bulk RNA-seq, scRNA-seq), we show that the subspace corresponding to the provided ground truth annotations is rarely the most compact one, as assumed by the methods maximizing the internal quality of clusters. Our results highlight the difficulty of fully validating subspace clusters (justified by the lack of feature annotations). Tested on identifying the ground-truth subspace, our method compared favorably with competing techniques on all datasets. Finally, we propose a suite of techniques to interpret the clustering results biologically in the absence of annotations. We demonstrate that subspace clustering can provide biologically meaningful sample-wise and feature-wise information, typically missed by traditional methods.CCS Concepts: • Computing methodologies → Genetic algorithms; Mixture models; Cluster analysis.ACM Reference FormatMadalina Ciortan and Matthieu Defrance. 2021. Optimization algorithm for omic data subspace clustering. 1, 1 (September 2021), 40 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

Cold Spring Harbor Laboratory

Madalina Ciortan Matthieu Defrance

2021

Title: Optimization algorithm for omic data subspace clustering

Description:

Subspace clustering identifies multiple feature subspaces embedded in a dataset together with the underlying sample clusters.

When applied to omic data, subspace clustering is a challenging task, as additional problems have to be addressed: the curse of dimensionality, the imperfect data quality and cluster separation, the presence of multiple subspaces representative of divergent views of the dataset, and the lack of consensus on the best clustering method.

First, we propose a computational method (discover) to perform subspace clustering on tabular high dimensional data by maximizing the internal clustering score (i.

cluster compactness) of feature subspaces.

Our algorithm can be used in both unsupervised and semi-supervised settings.

Secondly, by applying our method to a large set of omic datasets (i.

microarray, bulk RNA-seq, scRNA-seq), we show that the subspace corresponding to the provided ground truth annotations is rarely the most compact one, as assumed by the methods maximizing the internal quality of clusters.

Our results highlight the difficulty of fully validating subspace clusters (justified by the lack of feature annotations).

Tested on identifying the ground-truth subspace, our method compared favorably with competing techniques on all datasets.

Finally, we propose a suite of techniques to interpret the clustering results biologically in the absence of annotations.

We demonstrate that subspace clustering can provide biologically meaningful sample-wise and feature-wise information, typically missed by traditional methods.

CCS Concepts: • Computing methodologies → Genetic algorithms; Mixture models; Cluster analysis.

ACM Reference FormatMadalina Ciortan and Matthieu Defrance.

2021.

Optimization algorithm for omic data subspace clustering.

1, 1 (September 2021), 40 pages.

https://doi.

org/10.

1145/nnnnnnn.

nnnnnnn.

Back

Related Results

The Kernel Rough K-Means Algorithm

Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...

On Subspace-recurrent Operators

In this article, subspace-recurrent operators are presented and it is showed that the set of subspace-transitive operators is a strict subset of the set of subspace-recurrent opera...

MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization

<p>Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, the density-based clustering algorithm faces three challenging ...

Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm

In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...

Modeling Hybrid Metaheuristic Optimization Algorithm for Convergence Prediction

The project aims at the design and development of six hybrid nature inspired algorithms based on Grey Wolf Optimization algorithm with Artificial Bee Colony Optimization algorithm ...

Modeling Hybrid Metaheuristic Optimization Algorithm for Convergence Prediction

The project aims at the design and development of six hybrid nature inspired algorithms based on Grey Wolf Optimization algorithm with Artificial Bee Colony Optimization algorithm ...

An improved Coati Optimization Algorithm with multiple strategies for engineering design optimization problems

AbstractAiming at the problems of insufficient ability of artificial COA in the late optimization search period, loss of population diversity, easy to fall into local extreme value...

Research on a microseismic signal picking algorithm based on GTOA clustering

Abstract. Clustering is one of the challenging problems in machine learning. Adopting clustering methods for the picking of microseismic signals has emerged as a new approach. Howe...

Email:
Password:

Email: