Javascript must be enabled to continue!

Scalability and Efficiency in Distributed Big Data Architectures: A Comparative Study

With the rapid expansion of the size of data, there is a need for the development of scalable and efficient architectures for large scale data processing. This research conducts a comparative analysis between the performance, scalability and efficiency of the Apache Hadoop, Apache Spark, Apache Flink, and Google Bigtable big data frameworks. Finally, the experimental results indicate that the Apache Spark is faster in execution times by 3.5× than Hadoop, and the Apache Flink achieves 40% lower latency on real time analytics than Spark. Google Bigtable had good throughput at 5 million queries a second, but it was not flexible to computationally intense processes. Furthermore, this study examined the application of the machine learning and blockchain technologies in the implementation of the distributed systems for the unified backend that incorporates processing efficiency improvement by 25% and data integrity with the added computational overhead of 12%. The research demonstrates that Flink is most suitable for real time data streams, spark is the best tool for iterative workloads and bigtable is the most appropriate for structured high throughput storage. Nevertheless, questions remain on how to scale in the extreme workload case and balance security with performance. Finally, future research will focus on hybrid architectures that enable high speed and security performance for the next generation big data applications.

TechnoFit Academic Publishers LLC

Manikandan K Vamsee Pamisetty Srinivas Rao Challa Venkata Bhardwaj Komaragiri Kishore Challa Karthik Chava

Metallurgical and Materials Engineering

2025

Title: Scalability and Efficiency in Distributed Big Data Architectures: A Comparative Study

Description:

With the rapid expansion of the size of data, there is a need for the development of scalable and efficient architectures for large scale data processing.

This research conducts a comparative analysis between the performance, scalability and efficiency of the Apache Hadoop, Apache Spark, Apache Flink, and Google Bigtable big data frameworks.

Finally, the experimental results indicate that the Apache Spark is faster in execution times by 3.

5× than Hadoop, and the Apache Flink achieves 40% lower latency on real time analytics than Spark.

Google Bigtable had good throughput at 5 million queries a second, but it was not flexible to computationally intense processes.

Furthermore, this study examined the application of the machine learning and blockchain technologies in the implementation of the distributed systems for the unified backend that incorporates processing efficiency improvement by 25% and data integrity with the added computational overhead of 12%.

The research demonstrates that Flink is most suitable for real time data streams, spark is the best tool for iterative workloads and bigtable is the most appropriate for structured high throughput storage.

Nevertheless, questions remain on how to scale in the extreme workload case and balance security with performance.

Finally, future research will focus on hybrid architectures that enable high speed and security performance for the next generation big data applications.

Back

In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...

The Building Blocks of Data Science: Computing Systems and Analytical Frameworks for Big Data

In the dynamic and evolving field of data science, the capacity to process and analyze big data stands as a cornerstone for innovation and insight. "The Building Blocks of Data Sci...

Digital Footprint as a Source of Big Data in Education

The purpose of this study is to consider the prospects and problems of using big data in education.Materials and methods. The research methods include analysis, systematization and...

A comparative analysis of big data processing paradigms: Mapreduce vs. apache spark

The paper addresses a highly relevant and contemporary topic in the field of data processing. Big data is a crucial aspect of modern computing, and the choice of processing framewo...

Distributed Processing of Blind Source Separation

<p>Communication is performed by transmitting signals through a medium. It is common that signals originating from different sources are mixed in the transport medium. The op...

Cybersecurity Comparison of Brain-Based Automotive Electrical and Electronic Architectures

Modern autonomous vehicles with an electric/electronic (E/E) architecture represent the next big step in the automation and evolution of smart and self-driving vehicles. This techn...

Why Should Big Data-based Price Discrimination be Governed?

Abstract The e-commerce platform provides data service for resident merchants for precise marketing, but which also leads to frequent occurrence of big data-based price dis...

Sports Big Data: Management, Analysis, Applications, and Challenges

With the rapid growth of information technology and sports, analyzing sports information has become an increasingly challenging issue. Sports big data come from the Internet and sh...

Email:
Password:

Email:

Scalability and Efficiency in Distributed Big Data Architectures: A Comparative Study

Related Results