Javascript must be enabled to continue!

Distributed Computing Engines for Big Data Analytics

Technologies like cloud computing paved way for dealing with massive amounts of data. Prior to cloud, it was not possible unless you invest large amounts for computing resources. Now there is ecosystem which is conducive to storing and processing voluminous data that cannot be handled by local computing resources. With such ecosystem, big data technology came into existence. Big data is the data characterized by volume, velocity, veracity and variety. This has enabled enterprises to give more value to every piece of data. This in turn led to the increased usage of cloud for both storage and processing. For processing big data efficient technologies are required. New programming paradigm like MapReduce with Hadoop distributed programming framework is widely used. However, there are other emerging frameworks like Apache Spark and Apache Flink to handle big data more efficiently. In this paper, empirical study is made on the three frameworks like Hadoop, Apache Spark and Apache Flink with different parameters like type of network, block size of HDFS, input data size and other configuration changes. The experimental results revealed that Apache Spark and Apache Flink outperform Hadoop. This is evaluated with different benchmark big data workloads.

Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP

Bh. Prashanthi G. Sowjanya D. Krishna Madhuri

International Journal of Recent Technology and Engineering (IJRTE)

2019

Title: Distributed Computing Engines for Big Data Analytics

Description:

Technologies like cloud computing paved way for dealing with massive amounts of data.

Prior to cloud, it was not possible unless you invest large amounts for computing resources.

Now there is ecosystem which is conducive to storing and processing voluminous data that cannot be handled by local computing resources.

With such ecosystem, big data technology came into existence.

Big data is the data characterized by volume, velocity, veracity and variety.

This has enabled enterprises to give more value to every piece of data.

This in turn led to the increased usage of cloud for both storage and processing.

For processing big data efficient technologies are required.

New programming paradigm like MapReduce with Hadoop distributed programming framework is widely used.

However, there are other emerging frameworks like Apache Spark and Apache Flink to handle big data more efficiently.

In this paper, empirical study is made on the three frameworks like Hadoop, Apache Spark and Apache Flink with different parameters like type of network, block size of HDFS, input data size and other configuration changes.

The experimental results revealed that Apache Spark and Apache Flink outperform Hadoop.

This is evaluated with different benchmark big data workloads.

Back

In the dynamic and evolving field of data science, the capacity to process and analyze big data stands as a cornerstone for innovation and insight. "The Building Blocks of Data Sci...

Service Quality Improvement in the Banking Sector: A Data Analytics Perspective

Service quality in the banking sector is a critical determinant of customer satisfaction, loyalty, and competitive advantage. As banks strive to meet the evolving expectations of c...

People Analytics

People analytics refers to the systematic and scientific process of applying quantitative or qualitative data analysis methods to derive insights that shape and inform employee-rel...

Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management

Because of the increasing increase of data, which is frequently referred to as "big data," many different businesses have been severely impacted in recent years, necessitating the ...

Enhancing business performance: The role of data-driven analytics in strategic decision-making

In today’s highly competitive business landscape, organizations are increasingly turning to data-driven analytics to enhance performance and inform strategic decision-making. This ...

The role of big data analytics in improving teacher training in developing countries: A literature Review

Abstract The use of big data analytics is becoming increasingly prevalent in various fields, including education. This systematic literature review examines the role of big...

Digital Footprint as a Source of Big Data in Education

The purpose of this study is to consider the prospects and problems of using big data in education.Materials and methods. The research methods include analysis, systematization and...

The impact of the big data analytics on the Asian firms in Digital technology industry: The moderating role of Knowledge management

This study aims to evaluate the impact of big data analytics on the performance of companies in Asia's digital technology industry, as well as the role that knowledge management pl...

Email:
Password:

Email:

Distributed Computing Engines for Big Data Analytics

Related Results