Javascript must be enabled to continue!

Survey on Resource Management Solutions to Speed up Processing Small Files in Hadoop Cluster

High performance data analytics is a computing paradigm involving optimal placement of data, analytics and other computational resources such that superior performance is achieved with lesser resource consumption. Resource allocation and scheduling are the two major functionalities to be addressed in Hadoop clusters to satisfy the service level agreements of users for High performance data analytics applications. Though many solutions have been proposed for optimal resource allocation and scheduling, those schemes are designed for large Hadoop files. Recently with Internet of Things (IoT) convergence with big data, there is need to process large volumes of small files whose size is lower than block size of Hadoop. This creates huge storage overhead and exhausts Hadoop clusters computational resources. This survey analyzes the existing works on resource allocation and scheduling in Hadoop clusters and their suitability for small files. The aim is to identify the problems in existing resource allocation and scheduling approaches while handling small files. Based on the problems identified, prospective solution architecture is proposed.

Technoscience Academy

Prof. Shwetha K S Dr. Chandramouli H

International Journal of Scientific Research in Science, Engineering and Technology

2024

Title: Survey on Resource Management Solutions to Speed up Processing Small Files in Hadoop Cluster

Description:

Resource allocation and scheduling are the two major functionalities to be addressed in Hadoop clusters to satisfy the service level agreements of users for High performance data analytics applications.

Though many solutions have been proposed for optimal resource allocation and scheduling, those schemes are designed for large Hadoop files.

Recently with Internet of Things (IoT) convergence with big data, there is need to process large volumes of small files whose size is lower than block size of Hadoop.

This creates huge storage overhead and exhausts Hadoop clusters computational resources.

This survey analyzes the existing works on resource allocation and scheduling in Hadoop clusters and their suitability for small files.

The aim is to identify the problems in existing resource allocation and scheduling approaches while handling small files.

Based on the problems identified, prospective solution architecture is proposed.

Back

Related Results

Hadoop Tools

As the name indicates, this chapter explains the various additional tools provided by Hadoop. The additional tools provided by Hadoop distribution are Hadoop Streaming, Hadoop Arch...

Enhancing Big Data Security in Hadoop using Machine Learning

In the era of Big Data, where vast amounts of information are generated and analysed to extract valuable insights, ensuring the security of data has become paramount. Hadoop, as a ...

Secure Cloud Data with Attribute-based Honey Encryption

Abstract Encryption is a Technique to convert plain text into Cipher text, which is unreadable without an appropriate decryption key. Hadoop is a platform to process and st...

Inheritance of Cluster Headache and its Possible Link to Migraine

SYNOPSIS We evaluated the possibility that cluster headache may be a transmitted disorder, influenced by migraine genetics. In the first part of a two part study,...

Disq, a library for manipulating bioinformatics sequencing formats in Apache Spark

ADAM and GATK have independently developed parallel and distributed genomic applications on Apache Spark. To access flat file formats such as BAM, CRAM, SAM, and VC...

Constructing a VANET based on cluster chains

SUMMARYThe paper proposes a scheme on constructing a vehicular ad‐hoc network based on cluster chains. In the cluster construction algorithm, the distance from a potential cluster ...

Ciudad de Museos: clústeres de museos en la ciudad contemporánea

En nuestra cultura el museo ocupa un lugar privilegiado simbólicamente, pero también físicamente, en la ciudad. Y no tan sólo lo ocupa, sino lo crea, lo define, lo cambia y le da s...

Evaluation of genetic divergence in Barley (Hordeum vulgare L.) germplasms

Thirty genotypes of wheat were evaluated for assessing genetic divergence among eleven different characters across one environment for exploitation in a breeding programme for impr...

Email:
Password:

Email: