Javascript must be enabled to continue!
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
View through CrossRef
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performance and makes the clustering results subject to the influence of local optimality. A parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm is proposed to solve the problem. Firstly, the algorithm combines the nearest neighbor and inverse nearest neighbor strategies in -means (DBSCAN), and redefines the expansion conditions of clusters in the density-based spatial dutering of apps with noise (DBSCAN) algorithm by calculating the influence space of each data point, avoiding the problem of fuzzy division of boundary points of clusters with different densities; secondly, a feasible iterative noise point processing strategy is proposed by combining the nearest neighbor idea in KDBSCAN density clustering, which reduces the impact of noise points in the data on the performance of clustering algorithms; thirdly, an optimization and improvement strategy MCS (Majorization cuckoo search) based on the traditional cuckoo algorithm is proposed, which attenuates the weight of the probability of finding the nest and improves the algorithm convergence speed as the number of iterative searches increases, solving the problem that the clustering results are constrained by local optimality; finally, a parallel density clustering strategy MCS-KDBSCAN is proposed in combination with MapReduce, which reduces the communication burden of transmitting the local optimal solution of the parallel clustering algorithm by parallelizing the density clustering algorithm operation and improves the algorithm performance. Experiments show that the proposed MCS-KDBSCAN parallel density clustering algorithm is superior in terms of clustering accuracy and clustering running time.
Title: Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
Description:
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performance and makes the clustering results subject to the influence of local optimality.
A parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm is proposed to solve the problem.
Firstly, the algorithm combines the nearest neighbor and inverse nearest neighbor strategies in -means (DBSCAN), and redefines the expansion conditions of clusters in the density-based spatial dutering of apps with noise (DBSCAN) algorithm by calculating the influence space of each data point, avoiding the problem of fuzzy division of boundary points of clusters with different densities; secondly, a feasible iterative noise point processing strategy is proposed by combining the nearest neighbor idea in KDBSCAN density clustering, which reduces the impact of noise points in the data on the performance of clustering algorithms; thirdly, an optimization and improvement strategy MCS (Majorization cuckoo search) based on the traditional cuckoo algorithm is proposed, which attenuates the weight of the probability of finding the nest and improves the algorithm convergence speed as the number of iterative searches increases, solving the problem that the clustering results are constrained by local optimality; finally, a parallel density clustering strategy MCS-KDBSCAN is proposed in combination with MapReduce, which reduces the communication burden of transmitting the local optimal solution of the parallel clustering algorithm by parallelizing the density clustering algorithm operation and improves the algorithm performance.
Experiments show that the proposed MCS-KDBSCAN parallel density clustering algorithm is superior in terms of clustering accuracy and clustering running time.
Related Results
MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization
MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization
<p>Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, the density-based clustering algorithm faces three challenging ...
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background:
Clustering is one of the most important data mining methods. The k-means
(c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Near-neighbor Propagation Clustering Algorithm Based on Cuckoo Search
Near-neighbor Propagation Clustering Algorithm Based on Cuckoo Search
In this paper, a nearest neighbor propagation clustering algorithm (CSB-AP) based on cuckoo search is proposed to solve the problem of poor parameter setting of the AP algorithm. A...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the nonālinear manifold. Recently, linear discriminant analysis (LDA) based vario...
Uncertain data density peak clustering algorithm based on JS divergence
Uncertain data density peak clustering algorithm based on JS divergence
Aiming at the defects of traditional density-based uncertainty clustering algorithms, such as parameter sensitivity and poor clustering results for complex manifold uncertain data ...
IDCUP Algorithm to Classifying Arbitrary Shapes and Densities for Center-based Clustering Performance Analysis
IDCUP Algorithm to Classifying Arbitrary Shapes and Densities for Center-based Clustering Performance Analysis
Aim/Purpose: The clustering techniques are normally considered to determine the significant and meaningful subclasses purposed in datasets. It is an unsupervised type of Machine Le...
Research on a microseismic signal picking algorithm based on GTOA clustering
Research on a microseismic signal picking algorithm based on GTOA clustering
Abstract. Clustering is one of the challenging problems in machine learning. Adopting clustering methods for the picking of microseismic signals has emerged as a new approach. Howe...
Fundamental Concepts and Methodology for the Analysis of Animal Population Dynamics, with Particular Reference to Univoltine Species
Fundamental Concepts and Methodology for the Analysis of Animal Population Dynamics, with Particular Reference to Univoltine Species
This paper presents some concepts and methodology essential for the analysis of population dynamics of univoltine species. Simple stochastic difference equations, comprised of endo...


