Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Big data clustering techniques based on Spark: a literature review

View through CrossRef
A popular unsupervised learning method, known as clustering, is extensively used in data mining, machine learning and pattern recognition. The procedure involves grouping of single and distinct points in a group in such a way that they are either similar to each other or dissimilar to points of other clusters. Traditional clustering methods are greatly challenged by the recent massive growth of data. Therefore, several research works proposed novel designs for clustering methods that leverage the benefits of Big Data platforms, such as Apache Spark, which is designed for fast and distributed massive data processing. However, Spark-based clustering research is still in its early days. In this systematic survey, we investigate the existing Spark-based clustering methods in terms of their support to the characteristics Big Data. Moreover, we propose a new taxonomy for the Spark-based clustering methods. To the best of our knowledge, no survey has been conducted on Spark-based clustering of Big Data. Therefore, this survey aims to present a comprehensive summary of the previous studies in the field of Big Data clustering using Apache Spark during the span of 2010–2020. This survey also highlights the new research directions in the field of clustering massive data.
Title: Big data clustering techniques based on Spark: a literature review
Description:
A popular unsupervised learning method, known as clustering, is extensively used in data mining, machine learning and pattern recognition.
The procedure involves grouping of single and distinct points in a group in such a way that they are either similar to each other or dissimilar to points of other clusters.
Traditional clustering methods are greatly challenged by the recent massive growth of data.
Therefore, several research works proposed novel designs for clustering methods that leverage the benefits of Big Data platforms, such as Apache Spark, which is designed for fast and distributed massive data processing.
However, Spark-based clustering research is still in its early days.
In this systematic survey, we investigate the existing Spark-based clustering methods in terms of their support to the characteristics Big Data.
Moreover, we propose a new taxonomy for the Spark-based clustering methods.
To the best of our knowledge, no survey has been conducted on Spark-based clustering of Big Data.
Therefore, this survey aims to present a comprehensive summary of the previous studies in the field of Big Data clustering using Apache Spark during the span of 2010–2020.
This survey also highlights the new research directions in the field of clustering massive data.

Related Results

Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Optical Measurement of Spark Deflection Inside a Pre-chamber for Spark-Ignition Engines
Optical Measurement of Spark Deflection Inside a Pre-chamber for Spark-Ignition Engines
<div class="section abstract"><div class="htmlview paragraph">The start of combustion in a spark-ignited engine is highly dependent upon the conditions between the two ...
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
Optimizing machine learning techniques for genomics clustering
Optimizing machine learning techniques for genomics clustering
Optimisation des techniques d’apprentissage automatique pour le clustering génomique Dans le domaine de la bioinformatique, le clustering est une technique efficace...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...
Digital Footprint as a Source of Big Data in Education
Digital Footprint as a Source of Big Data in Education
The purpose of this study is to consider the prospects and problems of using big data in education.Materials and methods. The research methods include analysis, systematization and...

Back to Top