Javascript must be enabled to continue!
A System For Storing And Processing Big Data Based On The Apache Spark Platform
View through CrossRef
The primary objective of this paper is to investigate and implement the Apache Spark big data processing platform on a stock dataset, followed by the application of a machine learning technique for prediction and modeling. Specifically, PySpark, a Python API for Apache Spark, is utilized to facilitate the interaction. The Spark MLlib library is employed for data transformation, whereas the GraphX library is used for data modeling. Multiple executions of the experimental program demonstrated significant performance improvements, with notably shorter runtimes on the Spark cluster than on a single machine. These results highlight the advantages of distributed and parallel processing in large-scale data analysis.
Viet Nam National University Ho Chi Minh City
Title: A System For Storing And Processing Big Data Based On The Apache Spark Platform
Description:
The primary objective of this paper is to investigate and implement the Apache Spark big data processing platform on a stock dataset, followed by the application of a machine learning technique for prediction and modeling.
Specifically, PySpark, a Python API for Apache Spark, is utilized to facilitate the interaction.
The Spark MLlib library is employed for data transformation, whereas the GraphX library is used for data modeling.
Multiple executions of the experimental program demonstrated significant performance improvements, with notably shorter runtimes on the Spark cluster than on a single machine.
These results highlight the advantages of distributed and parallel processing in large-scale data analysis.
Related Results
Distributed Computing Engines for Big Data Analytics
Distributed Computing Engines for Big Data Analytics
Technologies like cloud computing paved way for dealing with massive amounts of data. Prior to cloud, it was not possible unless you invest large amounts for computing resources. N...
Software analysis of scientific texts: comparative study of distributed computing frameworks
Software analysis of scientific texts: comparative study of distributed computing frameworks
The relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a stud...
Tools and techniques for real-time data processing: A review
Tools and techniques for real-time data processing: A review
Real-time data processing is an essential component in the modern data landscape, where vast amounts of data are generated continuously from various sources such as Internet of Thi...
Digital Footprint as a Source of Big Data in Education
Digital Footprint as a Source of Big Data in Education
The purpose of this study is to consider the prospects and problems of using big data in education.Materials and methods. The research methods include analysis, systematization and...
A comparative analysis of big data processing paradigms: Mapreduce vs. apache spark
A comparative analysis of big data processing paradigms: Mapreduce vs. apache spark
The paper addresses a highly relevant and contemporary topic in the field of data processing. Big data is a crucial aspect of modern computing, and the choice of processing framewo...
Scalability and Efficiency in Distributed Big Data Architectures: A Comparative Study
Scalability and Efficiency in Distributed Big Data Architectures: A Comparative Study
With the rapid expansion of the size of data, there is a need for the development of scalable and efficient architectures for large scale data processing. This research conducts a ...
Why Should Big Data-based Price Discrimination be Governed?
Why Should Big Data-based Price Discrimination be Governed?
Abstract
The e-commerce platform provides data service for resident merchants for precise marketing, but which also leads to frequent occurrence of big data-based price dis...
Compressive structural bioinformatics
Compressive structural bioinformatics
We are developing compressed 3D molecular data representations and workflows (“Compressive Structural Bioinformatics”) to speed up mining and visualization of 3D structural data by...

