Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

CHOOSING SEEDS FOR SEMI-SUPERVISED GRAPH BASED CLUSTERING

View through CrossRef
Though clustering algorithms have long history, nowadays clustering topic still attracts a lot of attention because of the need of efficient data analysis tools in many applications such as social network, electronic commerce, GIS, etc. Recently, semi-supervised clustering, for example, semi-supervised K-Means, semi-supervised DBSCAN, semi-supervised graph-based clustering (SSGC) etc., which uses side information, has received a great deal of attention. Generally, there are two forms of side information: seed form (labeled data) and constraint form (must-link, cannot-link). By integrating information provided by the user or domain expert, the semi-supervised clustering can produce expected results. In fact, clustering results usually depend on side information provided, so different side information will produce different results of clustering. In some cases, the performance of clustering may decrease if the side information is not carefully chosen. This paper addresses the problem of efficient collection of seeds for semi-supervised clustering, especially for graph based clustering by seeding (SSGC). The properly collected seeds can boost the quality of clustering and minimize the number of queries solicited from the user. For this purpose, we have developed an active learning algorithm (called SKMMM) for the seeds collection task, which identifies candidates to solicit users by using the K-Means and min-max algorithms. Experiments conducted on real data sets from UCI and a real collected document data set show the effectiveness of our approach compared with other methods.
Publishing House for Science and Technology, Vietnam Academy of Science and Technology (Publications)
Title: CHOOSING SEEDS FOR SEMI-SUPERVISED GRAPH BASED CLUSTERING
Description:
Though clustering algorithms have long history, nowadays clustering topic still attracts a lot of attention because of the need of efficient data analysis tools in many applications such as social network, electronic commerce, GIS, etc.
Recently, semi-supervised clustering, for example, semi-supervised K-Means, semi-supervised DBSCAN, semi-supervised graph-based clustering (SSGC) etc.
, which uses side information, has received a great deal of attention.
Generally, there are two forms of side information: seed form (labeled data) and constraint form (must-link, cannot-link).
By integrating information provided by the user or domain expert, the semi-supervised clustering can produce expected results.
In fact, clustering results usually depend on side information provided, so different side information will produce different results of clustering.
In some cases, the performance of clustering may decrease if the side information is not carefully chosen.
This paper addresses the problem of efficient collection of seeds for semi-supervised clustering, especially for graph based clustering by seeding (SSGC).
The properly collected seeds can boost the quality of clustering and minimize the number of queries solicited from the user.
For this purpose, we have developed an active learning algorithm (called SKMMM) for the seeds collection task, which identifies candidates to solicit users by using the K-Means and min-max algorithms.
Experiments conducted on real data sets from UCI and a real collected document data set show the effectiveness of our approach compared with other methods.

Related Results

The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...
Adaptive Graph Convolution Using Heat Kernel for Attributed Graph Clustering
Adaptive Graph Convolution Using Heat Kernel for Attributed Graph Clustering
Attributed graphs contain a lot of node features and structural relationships, and how to utilize their inherent information sufficiently to improve graph clustering performance ha...
Abstract 902: Explainable AI: Graph machine learning for response prediction and biomarker discovery
Abstract 902: Explainable AI: Graph machine learning for response prediction and biomarker discovery
Abstract Accurately predicting drug sensitivity and understanding what is driving it are major challenges in drug discovery. Graphs are a natural framework for captu...
Domination of Polynomial with Application
Domination of Polynomial with Application
In this paper, .We .initiate the study of domination. polynomial , consider G=(V,E) be a simple, finite, and directed graph without. isolated. vertex .We present a study of the Ira...
GRACE: A General Graph Convolution Framework for Attributed Graph Clustering
GRACE: A General Graph Convolution Framework for Attributed Graph Clustering
Attributed graph clustering (AGC) is an important problem in graph mining as more and more complex data in real-world have been represented in graphs with attributed nodes. While i...
Uncertain data density peak clustering algorithm based on JS divergence
Uncertain data density peak clustering algorithm based on JS divergence
Aiming at the defects of traditional density-based uncertainty clustering algorithms, such as parameter sensitivity and poor clustering results for complex manifold uncertain data ...
Self-Supervised Based Multi-View Graph Presentation Learning for Drug-Drug Interaction Prediction
Self-Supervised Based Multi-View Graph Presentation Learning for Drug-Drug Interaction Prediction
Article Self-Supervised Based Multi-View Graph Presentation Learning for Drug-Drug Interaction Prediction Kuang Du 1,  Jing Du 2 and Zhi Wei 1,* 1 Department of Computer Science...

Back to Top