Javascript must be enabled to continue!

IDCUP Algorithm to Classifying Arbitrary Shapes and Densities for Center-based Clustering Performance Analysis

Aim/Purpose: The clustering techniques are normally considered to determine the significant and meaningful subclasses purposed in datasets. It is an unsupervised type of Machine Learning (ML) where the objective is to form groups from objects based on their similarity and used to determine the implicit relationships between the different features of the data. Cluster Analysis is considered a significant problem area in data exploration when dealing with arbitrary shape problems in different datasets. Clustering on large data sets has the following challenges: (1) clusters with arbitrary shapes; (2) less knowledge discovery process to decide the possible input features; (3) scalability for large data sizes. Density-based clustering has been known as a dominant method for determining the arbitrary-shape clusters. Background: Existing density-based clustering methods commonly cited in the literature have been examined in terms of their behavior with data sets that contain nested clusters of varying density. The existing methods are not enough or ideal for such data sets, because they typically partition the data into clusters that cannot be nested. Methodology: A density-based approach on traditional center-based clustering is introduced that assigns a weight to each cluster. The weights are then utilized in calculating the distances from data vectors to centroids by multiplying the distance by the centroid weight. Contribution: In this paper, we have examined different density-based clustering methods for data sets with nested clusters of varying density. Two such data sets were used to evaluate some of the commonly cited algorithms found in the literature. Nested clusters were found to be challenging for the existing algorithms. In utmost cases, the targeted algorithms either did not detect the largest clusters or simply divided large clusters into non-overlapping regions. But, it may be possible to detect all clusters by doing multiple runs of the algorithm with different inputs and then combining the results. This work considered three challenges of clustering methods. Findings: As a result, a center with a low weight will attract objects from further away than a centroid with higher weight. This allows dense clusters inside larger clusters to be recognized. The methods are tested experimentally using the K-means, DBSCAN, TURN*, and IDCUP algorithms. The experimental results with different data sets showed that IDCUP is more robust and produces better clusters than DBSCAN, TURN*, and K-means. Finally, we compare K-means, DBSCAN, TURN*, and to deal with arbitrary shapes problems at different datasets. IDCUP shows better scalability compared to TURN*. Future Research: As future recommendations of this research, we are concerned with the exploration of further available challenges of the knowledge discovery process in clustering along with complex data sets with more time. A hybrid approach based on density-based and model-based clustering algorithms needs to compare to achieve maximum performance accuracy and avoid the arbitrary shapes related problems including optimization. It is anticipated that the comparable kind of the future suggested process will attain improved performance with analogous precision in identification of clustering shapes.

Informing Science Institute

Saud Altaf Muhammad Waseem Waseem Laila Kazmi

Interdisciplinary Journal of Information, Knowledge, and Management

2020

Title: IDCUP Algorithm to Classifying Arbitrary Shapes and Densities for Center-based Clustering Performance Analysis

Description:

Aim/Purpose: The clustering techniques are normally considered to determine the significant and meaningful subclasses purposed in datasets.

It is an unsupervised type of Machine Learning (ML) where the objective is to form groups from objects based on their similarity and used to determine the implicit relationships between the different features of the data.

Cluster Analysis is considered a significant problem area in data exploration when dealing with arbitrary shape problems in different datasets.

Clustering on large data sets has the following challenges: (1) clusters with arbitrary shapes; (2) less knowledge discovery process to decide the possible input features; (3) scalability for large data sizes.

Density-based clustering has been known as a dominant method for determining the arbitrary-shape clusters.

Background: Existing density-based clustering methods commonly cited in the literature have been examined in terms of their behavior with data sets that contain nested clusters of varying density.

The existing methods are not enough or ideal for such data sets, because they typically partition the data into clusters that cannot be nested.

Methodology: A density-based approach on traditional center-based clustering is introduced that assigns a weight to each cluster.

The weights are then utilized in calculating the distances from data vectors to centroids by multiplying the distance by the centroid weight.

Contribution: In this paper, we have examined different density-based clustering methods for data sets with nested clusters of varying density.

Two such data sets were used to evaluate some of the commonly cited algorithms found in the literature.

Nested clusters were found to be challenging for the existing algorithms.

In utmost cases, the targeted algorithms either did not detect the largest clusters or simply divided large clusters into non-overlapping regions.

But, it may be possible to detect all clusters by doing multiple runs of the algorithm with different inputs and then combining the results.

This work considered three challenges of clustering methods.

Findings: As a result, a center with a low weight will attract objects from further away than a centroid with higher weight.

This allows dense clusters inside larger clusters to be recognized.

The methods are tested experimentally using the K-means, DBSCAN, TURN*, and IDCUP algorithms.

The experimental results with different data sets showed that IDCUP is more robust and produces better clusters than DBSCAN, TURN*, and K-means.

Finally, we compare K-means, DBSCAN, TURN*, and to deal with arbitrary shapes problems at different datasets.

IDCUP shows better scalability compared to TURN*.

Future Research: As future recommendations of this research, we are concerned with the exploration of further available challenges of the knowledge discovery process in clustering along with complex data sets with more time.

A hybrid approach based on density-based and model-based clustering algorithms needs to compare to achieve maximum performance accuracy and avoid the arbitrary shapes related problems including optimization.

It is anticipated that the comparable kind of the future suggested process will attain improved performance with analogous precision in identification of clustering shapes.

Back

Related Results

The Kernel Rough K-Means Algorithm

Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...

Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm

In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...

MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization

<p>Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, the density-based clustering algorithm faces three challenging ...

Image clustering using exponential discriminant analysis

Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...

Research on a microseismic signal picking algorithm based on GTOA clustering

Abstract. Clustering is one of the challenging problems in machine learning. Adopting clustering methods for the picking of microseismic signals has emerged as a new approach. Howe...

Comment text clustering algorithm based on improved DEC

Aiming at the problem that the initial number of clusters and cluster centers obtained by the clustering layer in the original deep embedding clustering (DEC) algorithm are highly ...

Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing

Smart manufacturing has been developed since the introduction of Industry 4.0. It consists of resource sharing and networking, predictive engineering, and material and data analyti...

Uncertain data density peak clustering algorithm based on JS divergence

Aiming at the defects of traditional density-based uncertainty clustering algorithms, such as parameter sensitivity and poor clustering results for complex manifold uncertain data ...

Email:
Password:

Email: