Javascript must be enabled to continue!
A Scalable Near Line Storage Solution for Very Big Data
View through CrossRef
Managing huge volumes of data is a problem now, and will only become worse with the advent of exascale computing and next generation observational systems. An important recognition is that data needs to be more easily migrated between storage tiers. Here we present a new solution, the Near-Line Data store (NLDS), for managing data migration between user facing storage systems and tape by using an object storage cache.  NLDS builds on lessons learned from previous experience developing the ESIWACE funded Joint Data Migration App (JDMA) and deploying it at the Centre for Environmental Data Analysis (CEDA).  CEDA currently has over 50PB of data stored on a range of disk based storage systems.  These systems are chosen on cost, power usage and accessibility via a network, and include three different types of POSIX disk and object storage. Tens of PB of additional data are also stored on tape. Each of these systems has different workflows, interfaces and latencies, causing difficulties for users.  
NLDS, developed with ESIWACE2 and other funding, is a multi-tiered storage solution using object storage as a front end to a tape library.  Users interact with NLDS via a HTTP API, with a Python library and command-line client provided to support both programmatic and interactive use.  Files transferred to NLDS are first written to the object storage, and a backup is made to tape.  When the object storage is approaching capacity, a set of policies is interrogated to determine which files will be removed from it.  Upon retrieving a file, NLDS may have to first transfer the file from tape to the object storage, if it has been deleted by the policies.  This implements a multi-tier of hot (disk), warm (object storage) and cold (tape) storage via a single interface. While systems like this are not novel, NLDS is open source, designed for ease of redeployment elsewhere, and for use from both local storage and remote sites. 
NLDS is based around a microservice architecture, with a message exchange brokering communication between the microservices, the HTTP API and the storage solutions.  The system is deployed via Kubernetes, with each microservice in its own Docker container, allowing the number of services to be scaled up or down, depending on the current load of NLDS.  This provides a scalable, power efficient system while ensuring that no messages between microservices are lost.  OAuth is used to authenticate and authorise users via a pluggable authentication layer. The use of object storage as the front end to the tape allows both local and remote cloud-based services to access the data, via a URL, so long as the user has the required credentials. 
NLDS is a a scalable solution to storing very large data for many users, with a user-friendly front end that is easily accessed via cloud computing. This talk will detail the architecture and discuss how the design meets the identified use cases.
Title: A Scalable Near Line Storage Solution for Very Big Data
Description:
Managing huge volumes of data is a problem now, and will only become worse with the advent of exascale computing and next generation observational systems.
An important recognition is that data needs to be more easily migrated between storage tiers.
Here we present a new solution, the Near-Line Data store (NLDS), for managing data migration between user facing storage systems and tape by using an object storage cache.
 NLDS builds on lessons learned from previous experience developing the ESIWACE funded Joint Data Migration App (JDMA) and deploying it at the Centre for Environmental Data Analysis (CEDA).
  CEDA currently has over 50PB of data stored on a range of disk based storage systems.
 These systems are chosen on cost, power usage and accessibility via a network, and include three different types of POSIX disk and object storage.
Tens of PB of additional data are also stored on tape.
Each of these systems has different workflows, interfaces and latencies, causing difficulties for users.
 
NLDS, developed with ESIWACE2 and other funding, is a multi-tiered storage solution using object storage as a front end to a tape library.
 Users interact with NLDS via a HTTP API, with a Python library and command-line client provided to support both programmatic and interactive use.
 Files transferred to NLDS are first written to the object storage, and a backup is made to tape.
 When the object storage is approaching capacity, a set of policies is interrogated to determine which files will be removed from it.
 Upon retrieving a file, NLDS may have to first transfer the file from tape to the object storage, if it has been deleted by the policies.
 This implements a multi-tier of hot (disk), warm (object storage) and cold (tape) storage via a single interface.
While systems like this are not novel, NLDS is open source, designed for ease of redeployment elsewhere, and for use from both local storage and remote sites.
 
NLDS is based around a microservice architecture, with a message exchange brokering communication between the microservices, the HTTP API and the storage solutions.
 The system is deployed via Kubernetes, with each microservice in its own Docker container, allowing the number of services to be scaled up or down, depending on the current load of NLDS.
 This provides a scalable, power efficient system while ensuring that no messages between microservices are lost.
 OAuth is used to authenticate and authorise users via a pluggable authentication layer.
The use of object storage as the front end to the tape allows both local and remote cloud-based services to access the data, via a URL, so long as the user has the required credentials.
 
NLDS is a a scalable solution to storing very large data for many users, with a user-friendly front end that is easily accessed via cloud computing.
This talk will detail the architecture and discuss how the design meets the identified use cases.
Related Results
Digital Footprint as a Source of Big Data in Education
Digital Footprint as a Source of Big Data in Education
The purpose of this study is to consider the prospects and problems of using big data in education.Materials and methods. The research methods include analysis, systematization and...
Procedure for Western blot v1
Procedure for Western blot v1
Goal: This document has the objective of standardizing the protocol for Western blot. This technique allows the detection of specific proteins separated on polyacrylamide gel and t...
Impacts of big data on accounting
Impacts of big data on accounting
Big data and data analytics are currently the buzzwords in both academia and industry to become data driven. Big data has been the trending topic in the accounting industry also. B...
Switching control strategy for an energy storage system based on multi-level logic judgment
Switching control strategy for an energy storage system based on multi-level logic judgment
Energy storage is a new, flexibly adjusting resource with prospects for broad application in power systems with high proportions of renewable energy integration. However, energy st...
Several Typical Paradigms of Industrial Big data Application
Several Typical Paradigms of Industrial Big data Application
Industrial big data is an important part of big data family, which has important application value for industrial production scheduling, risk perception, state identification, safe...
Big Data : Analysis
Big Data : Analysis
The amount of data in world is growing day by day. Data is growing because of use of internet, smart phone and social network. Big data is a collection of data sets which is very l...
Big Data promises value: is hardware technology taken onboard?
Big Data promises value: is hardware technology taken onboard?
Purpose
– The purpose of this paper is to explore the challenges posed by Big Data to current trends in computation, networking and storage technology at various ...
A comparative analysis of big data processing paradigms: Mapreduce vs. apache spark
A comparative analysis of big data processing paradigms: Mapreduce vs. apache spark
The paper addresses a highly relevant and contemporary topic in the field of data processing. Big data is a crucial aspect of modern computing, and the choice of processing framewo...

