Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Distributed Computing Engines for Big Data Analytics

View through CrossRef
Technologies like cloud computing paved way for dealing with massive amounts of data. Prior to cloud, it was not possible unless you invest large amounts for computing resources. Now there is ecosystem which is conducive to storing and processing voluminous data that cannot be handled by local computing resources. With such ecosystem, big data technology came into existence. Big data is the data characterized by volume, velocity, veracity and variety. This has enabled enterprises to give more value to every piece of data. This in turn led to the increased usage of cloud for both storage and processing. For processing big data efficient technologies are required. New programming paradigm like MapReduce with Hadoop distributed programming framework is widely used. However, there are other emerging frameworks like Apache Spark and Apache Flink to handle big data more efficiently. In this paper, empirical study is made on the three frameworks like Hadoop, Apache Spark and Apache Flink with different parameters like type of network, block size of HDFS, input data size and other configuration changes. The experimental results revealed that Apache Spark and Apache Flink outperform Hadoop. This is evaluated with different benchmark big data workloads.
Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP
Title: Distributed Computing Engines for Big Data Analytics
Description:
Technologies like cloud computing paved way for dealing with massive amounts of data.
Prior to cloud, it was not possible unless you invest large amounts for computing resources.
Now there is ecosystem which is conducive to storing and processing voluminous data that cannot be handled by local computing resources.
With such ecosystem, big data technology came into existence.
Big data is the data characterized by volume, velocity, veracity and variety.
This has enabled enterprises to give more value to every piece of data.
This in turn led to the increased usage of cloud for both storage and processing.
For processing big data efficient technologies are required.
New programming paradigm like MapReduce with Hadoop distributed programming framework is widely used.
However, there are other emerging frameworks like Apache Spark and Apache Flink to handle big data more efficiently.
In this paper, empirical study is made on the three frameworks like Hadoop, Apache Spark and Apache Flink with different parameters like type of network, block size of HDFS, input data size and other configuration changes.
The experimental results revealed that Apache Spark and Apache Flink outperform Hadoop.
This is evaluated with different benchmark big data workloads.

Related Results

BIG DATA ANALYTICS: A REVIEW OF ITS TRANSFORMATIVE ROLE IN MODERN BUSINESS INTELLIGENCE
BIG DATA ANALYTICS: A REVIEW OF ITS TRANSFORMATIVE ROLE IN MODERN BUSINESS INTELLIGENCE
In the dynamic landscape of modern business intelligence, Big Data Analytics has emerged as a transformative force, reshaping the way organizations derive insights from vast and di...
Impacts of big data on accounting
Impacts of big data on accounting
Big data and data analytics are currently the buzzwords in both academia and industry to become data driven. Big data has been the trending topic in the accounting industry also. B...
The Building Blocks of Data Science: Computing Systems and Analytical Frameworks for Big Data
The Building Blocks of Data Science: Computing Systems and Analytical Frameworks for Big Data
In the dynamic and evolving field of data science, the capacity to process and analyze big data stands as a cornerstone for innovation and insight. "The Building Blocks of Data Sci...
People Analytics
People Analytics
People analytics refers to the systematic and scientific process of applying quantitative or qualitative data analysis methods to derive insights that shape and inform employee-rel...
Service Quality Improvement in the Banking Sector: A Data Analytics Perspective
Service Quality Improvement in the Banking Sector: A Data Analytics Perspective
Service quality in the banking sector is a critical determinant of customer satisfaction, loyalty, and competitive advantage. As banks strive to meet the evolving expectations of c...
Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management
Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management
Because of the increasing increase of data, which is frequently referred to as "big data," many different businesses have been severely impacted in recent years, necessitating the ...
Optimizing edge cloud deployments for video analytics
Optimizing edge cloud deployments for video analytics
(English) As our digital world and physical realities blend together, we, as users, are growing to expect real-time interaction wherever and whenever we want. Newer internet servic...

Back to Top