Javascript must be enabled to continue!

Big data clustering techniques based on Spark: a literature review

A popular unsupervised learning method, known as clustering, is extensively used in data mining, machine learning and pattern recognition. The procedure involves grouping of single and distinct points in a group in such a way that they are either similar to each other or dissimilar to points of other clusters. Traditional clustering methods are greatly challenged by the recent massive growth of data. Therefore, several research works proposed novel designs for clustering methods that leverage the benefits of Big Data platforms, such as Apache Spark, which is designed for fast and distributed massive data processing. However, Spark-based clustering research is still in its early days. In this systematic survey, we investigate the existing Spark-based clustering methods in terms of their support to the characteristics Big Data. Moreover, we propose a new taxonomy for the Spark-based clustering methods. To the best of our knowledge, no survey has been conducted on Spark-based clustering of Big Data. Therefore, this survey aims to present a comprehensive summary of the previous studies in the field of Big Data clustering using Apache Spark during the span of 2010–2020. This survey also highlights the new research directions in the field of clustering massive data.

PeerJ

Mozamel M. Saeed Zaher Al Aghbari Mohammed Alsharidah

PeerJ Computer Science

2020

Title: Big data clustering techniques based on Spark: a literature review

Description:

A popular unsupervised learning method, known as clustering, is extensively used in data mining, machine learning and pattern recognition.

The procedure involves grouping of single and distinct points in a group in such a way that they are either similar to each other or dissimilar to points of other clusters.

Traditional clustering methods are greatly challenged by the recent massive growth of data.

Therefore, several research works proposed novel designs for clustering methods that leverage the benefits of Big Data platforms, such as Apache Spark, which is designed for fast and distributed massive data processing.

However, Spark-based clustering research is still in its early days.

In this systematic survey, we investigate the existing Spark-based clustering methods in terms of their support to the characteristics Big Data.

Moreover, we propose a new taxonomy for the Spark-based clustering methods.

To the best of our knowledge, no survey has been conducted on Spark-based clustering of Big Data.

Therefore, this survey aims to present a comprehensive summary of the previous studies in the field of Big Data clustering using Apache Spark during the span of 2010–2020.

This survey also highlights the new research directions in the field of clustering massive data.

Back

Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...

Pengaruh Penggunaan Busi Standar, Dan Busi Iridium Terhadap Daya Dan Torsi Pada MesinYamaha Force One

Abstract A spark plug is a part of an internal combustion engine with an electrode tip in the combustion chamber. Spar...

Optical Measurement of Spark Deflection Inside a Pre-chamber for Spark-Ignition Engines

<div class="section abstract"><div class="htmlview paragraph">The start of combustion in a spark-ignited engine is highly dependent upon the conditions between the two ...

The Kernel Rough K-Means Algorithm

Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...

Primerjalna književnost na prelomu tisočletja

In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...

Optimizing machine learning techniques for genomics clustering

Optimisation des techniques d’apprentissage automatique pour le clustering génomique Dans le domaine de la bioinformatique, le clustering est une technique efficace...

Image clustering using exponential discriminant analysis

Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...

Digital Footprint as a Source of Big Data in Education

The purpose of this study is to consider the prospects and problems of using big data in education.Materials and methods. The research methods include analysis, systematization and...

Email:
Password:

Email:

Big data clustering techniques based on Spark: a literature review

Related Results