Javascript must be enabled to continue!
A Memory-Aware Spark Cache Replacement Strategy
View through CrossRef
<p>Spark is currently the most widely used distributed computing framework, and its key data abstraction concept, Resilient Distributed Dataset (RDD), brings significant performance improvements in big data computing. In application scenarios, Spark jobs often need to replace RDDs due to insufficient memory. Spark uses the Least Recently Used (LRU) algorithm by default as the cache replacement strategy. This algorithm only considers the most recent use time of RDDs as the replacement basis. This characteristic may cause the RDDs that need to be reused to be evicted when performing cache replacement, resulting in a decrease in Spark performance. In response to the above problems, this paper proposes a memory-aware Spark cache replacement strategy, which comprehensively considers the cluster memory usage, RDD size, RDD dependencies, usage times and other information when performing cache replacement and selects the RDDs to be evicted. Furthermore, this paper designs extensive corresponding experiments to test and analyze the performance of the memory-aware Spark cache replacement strategy. The experimental data show that the proposed strategy can improve the performance by up to 13% compared with the LRU algorithm in different scenarios.</p>
<p> </p>
Journal of Internet Technology
Title: A Memory-Aware Spark Cache Replacement Strategy
Description:
<p>Spark is currently the most widely used distributed computing framework, and its key data abstraction concept, Resilient Distributed Dataset (RDD), brings significant performance improvements in big data computing.
In application scenarios, Spark jobs often need to replace RDDs due to insufficient memory.
Spark uses the Least Recently Used (LRU) algorithm by default as the cache replacement strategy.
This algorithm only considers the most recent use time of RDDs as the replacement basis.
This characteristic may cause the RDDs that need to be reused to be evicted when performing cache replacement, resulting in a decrease in Spark performance.
In response to the above problems, this paper proposes a memory-aware Spark cache replacement strategy, which comprehensively considers the cluster memory usage, RDD size, RDD dependencies, usage times and other information when performing cache replacement and selects the RDDs to be evicted.
Furthermore, this paper designs extensive corresponding experiments to test and analyze the performance of the memory-aware Spark cache replacement strategy.
The experimental data show that the proposed strategy can improve the performance by up to 13% compared with the LRU algorithm in different scenarios.
</p>
<p> </p>.
Related Results
An Efficient Software-Managed Cache Based on Cell Broadband Engine Architecture
An Efficient Software-Managed Cache Based on Cell Broadband Engine Architecture
While the CBEA (Cell Broadband Engine Architecture) offers substantial computational power, its explicit multilevel memory hierarchy poses significant challenges to traditional pro...
VISUALISASI PENGARUH ELEMEN PERANCANGAN CACHE PADA SYMMETRIC MULTIPROCESSORS
VISUALISASI PENGARUH ELEMEN PERANCANGAN CACHE PADA SYMMETRIC MULTIPROCESSORS
[Id]Cache memory merupakan salah satu pokok pembahasan penting dalam matakuliah organisasi dan arsitektur komputer. Akan tetapi, cache tidak dapat diakses dalam proses pembelajaran...
Pengaruh Penggunaan Busi Standar, Dan Busi Iridium Terhadap Daya Dan Torsi Pada MesinYamaha Force One
Pengaruh Penggunaan Busi Standar, Dan Busi Iridium Terhadap Daya Dan Torsi Pada MesinYamaha Force One
Abstract
A spark plug is a part of an internal combustion engine with an electrode tip in the combustion chamber. Spar...
Adjustable block size coherent caches
Adjustable block size coherent caches
Several studies have shown that the performance of coherent caches depends on the relationship between the granularity of sharing and locality exhibited by the program and the cach...
A Hierarchical Cache Architecture-Oriented Cache Management Scheme for Information-Centric Networking
A Hierarchical Cache Architecture-Oriented Cache Management Scheme for Information-Centric Networking
Information-Centric Networking (ICN) typically utilizes DRAM (Dynamic Random Access Memory) to build in-network cache components due to its high data transfer rate and low latency....
Optical Measurement of Spark Deflection Inside a Pre-chamber for Spark-Ignition Engines
Optical Measurement of Spark Deflection Inside a Pre-chamber for Spark-Ignition Engines
<div class="section abstract"><div class="htmlview paragraph">The start of combustion in a spark-ignited engine is highly dependent upon the conditions between the two ...
On models for performance evaluation and cache resources placement in multi-cache networks
On models for performance evaluation and cache resources placement in multi-cache networks
Sur des modèles pour l'évaluation de performance et le placement des ressources de cache dans les réseaux multi-cache
Au cours des dernières années, les fournisseur...
Concurrent Evaluation of Web Cache Replacement and Coherence Strategies
Concurrent Evaluation of Web Cache Replacement and Coherence Strategies
When studying Web cache replacement strategies, it is often assumed that documents are static. Such an assumption may not be realistic, especially when large-size caches are consid...

