Javascript must be enabled to continue!

CHOOSING SEEDS FOR SEMI-SUPERVISED GRAPH BASED CLUSTERING

Though clustering algorithms have long history, nowadays clustering topic still attracts a lot of attention because of the need of efficient data analysis tools in many applications such as social network, electronic commerce, GIS, etc. Recently, semi-supervised clustering, for example, semi-supervised K-Means, semi-supervised DBSCAN, semi-supervised graph-based clustering (SSGC) etc., which uses side information, has received a great deal of attention. Generally, there are two forms of side information: seed form (labeled data) and constraint form (must-link, cannot-link). By integrating information provided by the user or domain expert, the semi-supervised clustering can produce expected results. In fact, clustering results usually depend on side information provided, so different side information will produce different results of clustering. In some cases, the performance of clustering may decrease if the side information is not carefully chosen. This paper addresses the problem of efficient collection of seeds for semi-supervised clustering, especially for graph based clustering by seeding (SSGC). The properly collected seeds can boost the quality of clustering and minimize the number of queries solicited from the user. For this purpose, we have developed an active learning algorithm (called SKMMM) for the seeds collection task, which identifies candidates to solicit users by using the K-Means and min-max algorithms. Experiments conducted on real data sets from UCI and a real collected document data set show the effectiveness of our approach compared with other methods.

Publishing House for Science and Technology, Vietnam Academy of Science and Technology (Publications)

Cuong Le Viet Vu Vu Le Thi Kieu Oanh Nguyen Thi Hai Yen

Journal of Computer Science and Cybernetics

2020

Title: CHOOSING SEEDS FOR SEMI-SUPERVISED GRAPH BASED CLUSTERING

Description:

Recently, semi-supervised clustering, for example, semi-supervised K-Means, semi-supervised DBSCAN, semi-supervised graph-based clustering (SSGC) etc.

, which uses side information, has received a great deal of attention.

Generally, there are two forms of side information: seed form (labeled data) and constraint form (must-link, cannot-link).

By integrating information provided by the user or domain expert, the semi-supervised clustering can produce expected results.

In fact, clustering results usually depend on side information provided, so different side information will produce different results of clustering.

In some cases, the performance of clustering may decrease if the side information is not carefully chosen.

This paper addresses the problem of efficient collection of seeds for semi-supervised clustering, especially for graph based clustering by seeding (SSGC).

The properly collected seeds can boost the quality of clustering and minimize the number of queries solicited from the user.

For this purpose, we have developed an active learning algorithm (called SKMMM) for the seeds collection task, which identifies candidates to solicit users by using the K-Means and min-max algorithms.

Experiments conducted on real data sets from UCI and a real collected document data set show the effectiveness of our approach compared with other methods.

Back

(English) Deep Learning allows the extraction of complex features directly from raw input data, eliminating the need for hand-crafted features from the classical Machine Learning p...

GRAPH BASED CLUSTERING WITH CONSTRAINTS AND ACTIVE LEARNING

During the past few years, semi-supervised clustering has emerged as a new interesting direction in machine learning research. In a semi-supervised clustering algorithm, the cluste...

Novel Dual-Constraint-Based Semi-Supervised Deep Clustering Approach

Semi-supervised clustering can be viewed as a clustering paradigm that exploits both labeled and unlabeled data to steer learning accurate data clusters and avoid local minimum sol...

Bilangan Terhubung Titik Pelangi pada Graf Garis dan Graf Tengah dari Hasil Operasi Comb Graf Bintang C<sub>3</sub> dan Graf Bintang S<sub>n</sub>

Penelitian ini bertujuan menentukan bilangan terhubung titik pelangi (rainbow vertex connection number) pada graf garis dan graf tengah yang diperoleh dari hasil operasi comb antar...

The Kernel Rough K-Means Algorithm

Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...

Soft Semi-Supervised Deep Learning-Based Clustering

Semi-supervised clustering typically relies on both labeled and unlabeled data to guide the learning process towards the optimal data partition and to prevent falling into local mi...

A Comparative Study of Graph Kernels and Clustering Algorithms

Graph kernels have evolved as a promising and popular method for graph clustering over the last decade. In this work, comparative study on the five standard graph kernel techniques...

Bootstrapping a Biodiversity Knowledge Graph

The "biodiversity knowledge graph" is a nice metaphor for connecting biodiversity data sources, but can we actually build it? Do we have sufficient linked data available? Given tha...

Email:
Password:

Email:

CHOOSING SEEDS FOR SEMI-SUPERVISED GRAPH BASED CLUSTERING

Related Results