Javascript must be enabled to continue!
Non-equidistant checkpointing and quantitative resilience modeling
View through CrossRef
Software intensive systems rely on checkpointing to prevent loss of computation, by per-forming periodic backups. Non-equidistant checkpointing strategies have been proposed for specialized hardware and software applications as well as specific failure distributions. How-ever, a general method to identify a non-equidistant checkpointing strategy for an arbitrary combination of application and failure distribution would be beneficial. This thesis proposes an approach to identify a near optimal non-equidistant checkpointing strategy with a genetic algorithm, which only requires knowledge of the failure distribution. Experiments suggest that the approach consistently outperforms the traditional strategy of equidistant check-points under (i) a range of total processing times and (ii) different values of distributions exhibiting increasing, constant, and decreasing failure rates. Although many systems and processes are amenable to reliability modeling, researchers have also demonstrated interest in bringing a system back to its original performance after a deterioration, which is known as resilience engineering: the ability of a system to respond, absorb, adapt, and recover from a disruptive event. Several metrics to quantify resilience have been proposed in the literature. However, fewer studies have proposed models to predict the metrics. Hence, this thesis presents two alternative approaches to model and predict performance and resilience metrics, including (i) bathtub-shaped hazard functions and (ii)mixture distributions with techniques from reliability engineering. Historical data on job loss during recession in the United States are used to assess the predictive accuracy of these approaches. The results suggest that both approaches can produce accurate predictions for several of the data sets well, but that data sets that experience a sudden drop in performance or deviate from the assumption of a single decrease and subsequent increase cannot be fit to either class of proposed models, necessitating additional modeling efforts that can effectively characterize these more general scenarios.
Title: Non-equidistant checkpointing and quantitative resilience modeling
Description:
Software intensive systems rely on checkpointing to prevent loss of computation, by per-forming periodic backups.
Non-equidistant checkpointing strategies have been proposed for specialized hardware and software applications as well as specific failure distributions.
How-ever, a general method to identify a non-equidistant checkpointing strategy for an arbitrary combination of application and failure distribution would be beneficial.
This thesis proposes an approach to identify a near optimal non-equidistant checkpointing strategy with a genetic algorithm, which only requires knowledge of the failure distribution.
Experiments suggest that the approach consistently outperforms the traditional strategy of equidistant check-points under (i) a range of total processing times and (ii) different values of distributions exhibiting increasing, constant, and decreasing failure rates.
Although many systems and processes are amenable to reliability modeling, researchers have also demonstrated interest in bringing a system back to its original performance after a deterioration, which is known as resilience engineering: the ability of a system to respond, absorb, adapt, and recover from a disruptive event.
Several metrics to quantify resilience have been proposed in the literature.
However, fewer studies have proposed models to predict the metrics.
Hence, this thesis presents two alternative approaches to model and predict performance and resilience metrics, including (i) bathtub-shaped hazard functions and (ii)mixture distributions with techniques from reliability engineering.
Historical data on job loss during recession in the United States are used to assess the predictive accuracy of these approaches.
The results suggest that both approaches can produce accurate predictions for several of the data sets well, but that data sets that experience a sudden drop in performance or deviate from the assumption of a single decrease and subsequent increase cannot be fit to either class of proposed models, necessitating additional modeling efforts that can effectively characterize these more general scenarios.
Related Results
Selection of location of radiators in a non-equivident antenna array
Selection of location of radiators in a non-equivident antenna array
Antennas are one of the main elements of radio engineering systems. Phased antenna arrays (PAR), which make it possible to regulate the direction of radiation due to the ability to...
The concept of resilience- the scientific adaptation for society health
The concept of resilience- the scientific adaptation for society health
The main idea of the paper to indicate the factors of resilience indicators. The task of the research - a theoretical analysis of the latest research resilience factors and resilie...
The roles and potential of resilience-based management for sustainable decision-making in geoengineering
The roles and potential of resilience-based management for sustainable decision-making in geoengineering
In its most general conceptualization, resilience refers to a natural, social, or engineered system’s capacity to absorb shocks, adapt, and recover. Resilience has gained...
Flood resilience measurement for communities: data for science and practice
Flood resilience measurement for communities: data for science and practice
<p>Given the increased attention put on strengthening disaster resilience, there is a growing need to invest in its measurement and the overall accountability of resi...
Resilience after adversity: an umbrella review of adversity protective factors and resilience-promoting interventions
Resilience after adversity: an umbrella review of adversity protective factors and resilience-promoting interventions
IntroductionResilience is the dynamic adaptive process of maintaining or recovering mental health from stressors, such as trauma, challenging life circumstances, critical transitio...
Building Climate Resilience in Rainfed Landscapes Needs More Than Good Will
Building Climate Resilience in Rainfed Landscapes Needs More Than Good Will
Rainfed smallholder farming is particularly vulnerable to climate change, which can greatly exacerbate existing poverty and livelihood challenges. Understanding the complexity of t...
On Almost-Equidistant Sets - II
On Almost-Equidistant Sets - II
A set in $\mathbb R^d$ is called almost-equidistant if for any three distinct points in the set, some two are at unit distance apart. First, we give a short proof of the result of ...
Psychological resilience in healthcare workers: A review of strategies and intervention
Psychological resilience in healthcare workers: A review of strategies and intervention
Healthcare workers face numerous stressors in their demanding and often emotionally taxing roles. The importance of psychological resilience in mitigating the impact of these stres...

