Javascript must be enabled to continue!

A Scalable Near Line Storage Solution for Very Big Data

Managing huge volumes of data is a problem now, and will only become worse with the advent of exascale computing and next generation observational systems. An important recognition is that data needs to be more easily migrated between storage tiers. Here we present a new solution, the Near-Line Data store (NLDS), for managing data migration between user facing storage systems and tape by using an object storage cache.  NLDS builds on lessons learned from previous experience developing the ESIWACE funded Joint Data Migration App (JDMA) and deploying it at the Centre for Environmental Data Analysis (CEDA).  CEDA currently has over 50PB of data stored on a range of disk based storage systems.  These systems are chosen on cost, power usage and accessibility via a network, and include three different types of POSIX disk and object storage. Tens of PB of additional data are also stored on tape. Each of these systems has different workflows, interfaces and latencies, causing difficulties for users.   NLDS, developed with ESIWACE2 and other funding, is a multi-tiered storage solution using object storage as a front end to a tape library.  Users interact with NLDS via a HTTP API, with a Python library and command-line client provided to support both programmatic and interactive use.  Files transferred to NLDS are first written to the object storage, and a backup is made to tape.  When the object storage is approaching capacity, a set of policies is interrogated to determine which files will be removed from it.  Upon retrieving a file, NLDS may have to first transfer the file from tape to the object storage, if it has been deleted by the policies.  This implements a multi-tier of hot (disk), warm (object storage) and cold (tape) storage via a single interface. While systems like this are not novel, NLDS is open source, designed for ease of redeployment elsewhere, and for use from both local storage and remote sites.  NLDS is based around a microservice architecture, with a message exchange brokering communication between the microservices, the HTTP API and the storage solutions.  The system is deployed via Kubernetes, with each microservice in its own Docker container, allowing the number of services to be scaled up or down, depending on the current load of NLDS.  This provides a scalable, power efficient system while ensuring that no messages between microservices are lost.  OAuth is used to authenticate and authorise users via a pluggable authentication layer. The use of object storage as the front end to the tape allows both local and remote cloud-based services to access the data, via a URL, so long as the user has the required credentials.  NLDS is a a scalable solution to storing very large data for many users, with a user-friendly front end that is easily accessed via cloud computing. This talk will detail the architecture and discuss how the design meets the identified use cases.

Copernicus GmbH

Neil Massey Jack Leland Bryan Lawrence

2023

Title: A Scalable Near Line Storage Solution for Very Big Data

Description:

Managing huge volumes of data is a problem now, and will only become worse with the advent of exascale computing and next generation observational systems.

An important recognition is that data needs to be more easily migrated between storage tiers.

Here we present a new solution, the Near-Line Data store (NLDS), for managing data migration between user facing storage systems and tape by using an object storage cache.

 NLDS builds on lessons learned from previous experience developing the ESIWACE funded Joint Data Migration App (JDMA) and deploying it at the Centre for Environmental Data Analysis (CEDA).

  CEDA currently has over 50PB of data stored on a range of disk based storage systems.

 These systems are chosen on cost, power usage and accessibility via a network, and include three different types of POSIX disk and object storage.

Tens of PB of additional data are also stored on tape.

Each of these systems has different workflows, interfaces and latencies, causing difficulties for users.

  NLDS, developed with ESIWACE2 and other funding, is a multi-tiered storage solution using object storage as a front end to a tape library.

 Users interact with NLDS via a HTTP API, with a Python library and command-line client provided to support both programmatic and interactive use.

 Files transferred to NLDS are first written to the object storage, and a backup is made to tape.

 When the object storage is approaching capacity, a set of policies is interrogated to determine which files will be removed from it.

 Upon retrieving a file, NLDS may have to first transfer the file from tape to the object storage, if it has been deleted by the policies.

 This implements a multi-tier of hot (disk), warm (object storage) and cold (tape) storage via a single interface.

While systems like this are not novel, NLDS is open source, designed for ease of redeployment elsewhere, and for use from both local storage and remote sites.

  NLDS is based around a microservice architecture, with a message exchange brokering communication between the microservices, the HTTP API and the storage solutions.

 The system is deployed via Kubernetes, with each microservice in its own Docker container, allowing the number of services to be scaled up or down, depending on the current load of NLDS.

 This provides a scalable, power efficient system while ensuring that no messages between microservices are lost.

 OAuth is used to authenticate and authorise users via a pluggable authentication layer.

The use of object storage as the front end to the tape allows both local and remote cloud-based services to access the data, via a URL, so long as the user has the required credentials.

  NLDS is a a scalable solution to storing very large data for many users, with a user-friendly front end that is easily accessed via cloud computing.

This talk will detail the architecture and discuss how the design meets the identified use cases.

Back

Related Results

Procedure for Western blot v1

Goal: This document has the objective of standardizing the protocol for Western blot. This technique allows the detection of specific proteins separated on polyacrylamide gel and t...

Digital Footprint as a Source of Big Data in Education

The purpose of this study is to consider the prospects and problems of using big data in education.Materials and methods. The research methods include analysis, systematization and...

Switching control strategy for an energy storage system based on multi-level logic judgment

Energy storage is a new, flexibly adjusting resource with prospects for broad application in power systems with high proportions of renewable energy integration. However, energy st...

A comparative analysis of big data processing paradigms: Mapreduce vs. apache spark

The paper addresses a highly relevant and contemporary topic in the field of data processing. Big data is a crucial aspect of modern computing, and the choice of processing framewo...

Big Data promises value: is hardware technology taken onboard?

Purpose – The purpose of this paper is to explore the challenges posed by Big Data to current trends in computation, networking and storage technology at various ...

Postharvest quality and storage life of Kuini (Mangifera Odorata Griff) at different storage temperature

Mangifera Odorata or locally called Kuini, is a mango species with attractive striking orange flesh and have strong and unique smell, make it special in local market. Research is b...

Challenges and Research Disputes and Tools in Big Data Analytics

Big Data is the era of data processing. Big Data is the Collate’s observer data sets that are complicated that traditional data-processing abilities.There are the various challenge...

The Building Blocks of Data Science: Computing Systems and Analytical Frameworks for Big Data

In the dynamic and evolving field of data science, the capacity to process and analyze big data stands as a cornerstone for innovation and insight. "The Building Blocks of Data Sci...

Email:
Password:

Email: