Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm

View through CrossRef
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performance and makes the clustering results subject to the influence of local optimality. A parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm is proposed to solve the problem. Firstly, the algorithm combines the nearest neighbor and inverse nearest neighbor strategies in -means (DBSCAN), and redefines the expansion conditions of clusters in the density-based spatial dutering of apps with noise (DBSCAN) algorithm by calculating the influence space of each data point, avoiding the problem of fuzzy division of boundary points of clusters with different densities; secondly, a feasible iterative noise point processing strategy is proposed by combining the nearest neighbor idea in KDBSCAN density clustering, which reduces the impact of noise points in the data on the performance of clustering algorithms; thirdly, an optimization and improvement strategy MCS (Majorization cuckoo search) based on the traditional cuckoo algorithm is proposed, which attenuates the weight of the probability of finding the nest and improves the algorithm convergence speed as the number of iterative searches increases, solving the problem that the clustering results are constrained by local optimality; finally, a parallel density clustering strategy MCS-KDBSCAN is proposed in combination with MapReduce, which reduces the communication burden of transmitting the local optimal solution of the parallel clustering algorithm by parallelizing the density clustering algorithm operation and improves the algorithm performance. Experiments show that the proposed MCS-KDBSCAN parallel density clustering algorithm is superior in terms of clustering accuracy and clustering running time.
Title: Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
Description:
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performance and makes the clustering results subject to the influence of local optimality.
A parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm is proposed to solve the problem.
Firstly, the algorithm combines the nearest neighbor and inverse nearest neighbor strategies in -means (DBSCAN), and redefines the expansion conditions of clusters in the density-based spatial dutering of apps with noise (DBSCAN) algorithm by calculating the influence space of each data point, avoiding the problem of fuzzy division of boundary points of clusters with different densities; secondly, a feasible iterative noise point processing strategy is proposed by combining the nearest neighbor idea in KDBSCAN density clustering, which reduces the impact of noise points in the data on the performance of clustering algorithms; thirdly, an optimization and improvement strategy MCS (Majorization cuckoo search) based on the traditional cuckoo algorithm is proposed, which attenuates the weight of the probability of finding the nest and improves the algorithm convergence speed as the number of iterative searches increases, solving the problem that the clustering results are constrained by local optimality; finally, a parallel density clustering strategy MCS-KDBSCAN is proposed in combination with MapReduce, which reduces the communication burden of transmitting the local optimal solution of the parallel clustering algorithm by parallelizing the density clustering algorithm operation and improves the algorithm performance.
Experiments show that the proposed MCS-KDBSCAN parallel density clustering algorithm is superior in terms of clustering accuracy and clustering running time.

Related Results

Multi-constraint scheduling of MapReduce workloads
Multi-constraint scheduling of MapReduce workloads
In recent years there has been an extraordinary growth of large-scale data processing and related technologies in both, industry and academic communities. This trend is mostly driv...
Optimizing data management for MapReduce applications on large-scale distributed infrastructures
Optimizing data management for MapReduce applications on large-scale distributed infrastructures
Optimisation de la gestion des données pour les applications MapReduce sur des infrastructures distribuées à grande échelle Les applications data-intensive sont lar...
MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization
MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization
<p>Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, the density-based clustering algorithm faces three challenging ...
Improving MapReduce Performance on Clusters
Improving MapReduce Performance on Clusters
Amélioration des performances de MapReduce sur grappe de calcul Beaucoup de disciplines scientifiques s'appuient désormais sur l'analyse et la fouille de masses gig...
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Efficient parallel implementation of the SHRiMP sequence alignment tool using MapReduce
Efficient parallel implementation of the SHRiMP sequence alignment tool using MapReduce
With the advent of ultra high-throughput DNA sequencing technologies used in Next-Generation Sequencing (NGS) machines, we are facing a daunting new era in petabyte scale bioinform...
Near-neighbor Propagation Clustering Algorithm Based on Cuckoo Search
Near-neighbor Propagation Clustering Algorithm Based on Cuckoo Search
In this paper, a nearest neighbor propagation clustering algorithm (CSB-AP) based on cuckoo search is proposed to solve the problem of poor parameter setting of the AP algorithm. A...

Back to Top