Javascript must be enabled to continue!

Sound Policy Iteration

Abstract Reachability probabilities and expected rewards are two main classes of properties that are needed in the fields of reinforcement learning and formal verification. Value iteration and policy iteration are two iterative numerical method to compute the underlying properties. One of their drawbacks is that they only provide lower bounds for the computed values. To cover this challenge, some sound variations of value iteration have been proposed that also use upper bounds for their computed values and provide a guarantee for their soundness. In this paper, we focus on policy iteration and explain how this technique can be extended to provide sound values. For maximal expected reward, we use policy iteration to update lower bound values and apply action elimination for updating upper bounds. For minimal expected rewards, we apply policy iteration for upper bounds and action elimination for lower bounds. We employ some improved techniques to reduce the running time of our extension on policy iteration. We experimentally show that our proposed techniques outperform available sound value iteration techniques on the main part of standard case study models.

Springer Science and Business Media LLC

Mohammadsadegh Mohagheghi Anahita Khademi

2025

Title: Sound Policy Iteration

Description:

Abstract Reachability probabilities and expected rewards are two main classes of properties that are needed in the fields of reinforcement learning and formal verification.

Value iteration and policy iteration are two iterative numerical method to compute the underlying properties.

One of their drawbacks is that they only provide lower bounds for the computed values.

To cover this challenge, some sound variations of value iteration have been proposed that also use upper bounds for their computed values and provide a guarantee for their soundness.

In this paper, we focus on policy iteration and explain how this technique can be extended to provide sound values.

For maximal expected reward, we use policy iteration to update lower bound values and apply action elimination for updating upper bounds.

For minimal expected rewards, we apply policy iteration for upper bounds and action elimination for lower bounds.

We employ some improved techniques to reduce the running time of our extension on policy iteration.

We experimentally show that our proposed techniques outperform available sound value iteration techniques on the main part of standard case study models.

Back

This report sets out the findings from one of four projects commissioned by Wellcome Policy Lab to pilot creative approaches to policy development. In this project, Scientia Script...

Responsibilised Resilience? Reworking Neoliberal Social Policy Texts

Introduction This essay begins with the premise that resilience, broadly defined as positive adaptation despite adversity (Garmezy and Rutter), and resilience building are importa...

Modeling methods for dispersive sound speed profiles of the Martian atmosphere and their effects on sound propagation paths

At present, Mars acoustic detection is gradually becoming an important new tool for the knowledge and exploration of Mars. To explore the sources of Mars sound, it is necessary to ...

The Role of Static Pressure and Temperature in Building Acoustics

The influence of static pressure and temperature on sound reduction indices, impact sound pressure levels, improvements of impact sound pressure levels and sound reduction indices,...

Generalized CR-Iteration Scheme with Application in Textile Designing

In this manuscript, we develop a new iteration method that is generalized CR-iteration method. We generate the Julia and Mandelbrot set fractals for a complex function where c ∈ ...

From Sound to Sound Space, Sound Environment, Soundscape, Sound Milieu or Ambiance …

This article proposes approaching the phenomenon of sound as a fabric of relationships. Critiquing the notion of a sound object as it has become defined thanks to the fixity enable...

A recognition method research based on the heart sound texture map

In order to improve the Heart Sound recognition rate and reduce the recognition time, in this paper, we introduces a new method for Heart Sound pattern recognition by using Heart S...

Design and Performance Analysis of Sound Source Localization using Time Difference of Arrival Estimation

Sound source localization (SSL) is a process of processing sound signals received from sound sensors and locating the sound origin. In many applications, precise localization of th...

Email:
Password:

Email:

Sound Policy Iteration

Related Results