Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Survey on Resource Management Solutions to Speed up Processing Small Files in Hadoop Cluster

View through CrossRef
High performance data analytics is a computing paradigm involving optimal placement of data, analytics and other computational resources such that superior performance is achieved with lesser resource consumption. Resource allocation and scheduling are the two major functionalities to be addressed in Hadoop clusters to satisfy the service level agreements of users for High performance data analytics applications. Though many solutions have been proposed for optimal resource allocation and scheduling, those schemes are designed for large Hadoop files. Recently with Internet of Things (IoT) convergence with big data, there is need to process large volumes of small files whose size is lower than block size of Hadoop. This creates huge storage overhead and exhausts Hadoop clusters computational resources. This survey analyzes the existing works on resource allocation and scheduling in Hadoop clusters and their suitability for small files. The aim is to identify the problems in existing resource allocation and scheduling approaches while handling small files. Based on the problems identified, prospective solution architecture is proposed.
Title: Survey on Resource Management Solutions to Speed up Processing Small Files in Hadoop Cluster
Description:
High performance data analytics is a computing paradigm involving optimal placement of data, analytics and other computational resources such that superior performance is achieved with lesser resource consumption.
Resource allocation and scheduling are the two major functionalities to be addressed in Hadoop clusters to satisfy the service level agreements of users for High performance data analytics applications.
Though many solutions have been proposed for optimal resource allocation and scheduling, those schemes are designed for large Hadoop files.
Recently with Internet of Things (IoT) convergence with big data, there is need to process large volumes of small files whose size is lower than block size of Hadoop.
This creates huge storage overhead and exhausts Hadoop clusters computational resources.
This survey analyzes the existing works on resource allocation and scheduling in Hadoop clusters and their suitability for small files.
The aim is to identify the problems in existing resource allocation and scheduling approaches while handling small files.
Based on the problems identified, prospective solution architecture is proposed.

Related Results

Hadoop Tools
Hadoop Tools
As the name indicates, this chapter explains the various additional tools provided by Hadoop. The additional tools provided by Hadoop distribution are Hadoop Streaming, Hadoop Arch...
Enhancing Big Data Security in Hadoop using Machine Learning
Enhancing Big Data Security in Hadoop using Machine Learning
In the era of Big Data, where vast amounts of information are generated and analysed to extract valuable insights, ensuring the security of data has become paramount. Hadoop, as a ...
Secure Cloud  Data with Attribute-based Honey Encryption
Secure Cloud  Data with Attribute-based Honey Encryption
Abstract Encryption is a Technique to convert plain text into Cipher text, which is unreadable without an appropriate decryption key. Hadoop is a platform to process and st...
Inheritance of Cluster Headache and its Possible Link to Migraine
Inheritance of Cluster Headache and its Possible Link to Migraine
SYNOPSIS We evaluated the possibility that cluster headache may be a transmitted disorder, influenced by migraine genetics. In the first part of a two part study,...
Disq, a library for manipulating bioinformatics sequencing formats in Apache Spark
Disq, a library for manipulating bioinformatics sequencing formats in Apache Spark
ADAM and GATK have independently developed parallel and distributed genomic applications on Apache Spark. To access flat file formats such as BAM, CRAM, SAM, and VC...
Constructing a VANET based on cluster chains
Constructing a VANET based on cluster chains
SUMMARYThe paper proposes a scheme on constructing a vehicular ad‐hoc network based on cluster chains. In the cluster construction algorithm, the distance from a potential cluster ...
Ciudad de Museos: clústeres de museos en la ciudad contemporánea
Ciudad de Museos: clústeres de museos en la ciudad contemporánea
En nuestra cultura el museo ocupa un lugar privilegiado simbólicamente, pero también físicamente, en la ciudad. Y no tan sólo lo ocupa, sino lo crea, lo define, lo cambia y le da s...
Evaluation of genetic divergence in Barley (Hordeum vulgare L.) germplasms
Evaluation of genetic divergence in Barley (Hordeum vulgare L.) germplasms
Thirty genotypes of wheat were evaluated for assessing genetic divergence among eleven different characters across one environment for exploitation in a breeding programme for impr...

Back to Top