Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Sound Policy Iteration

View through CrossRef
Abstract Reachability probabilities and expected rewards are two main classes of properties that are needed in the fields of reinforcement learning and formal verification. Value iteration and policy iteration are two iterative numerical method to compute the underlying properties. One of their drawbacks is that they only provide lower bounds for the computed values. To cover this challenge, some sound variations of value iteration have been proposed that also use upper bounds for their computed values and provide a guarantee for their soundness. In this paper, we focus on policy iteration and explain how this technique can be extended to provide sound values. For maximal expected reward, we use policy iteration to update lower bound values and apply action elimination for updating upper bounds. For minimal expected rewards, we apply policy iteration for upper bounds and action elimination for lower bounds. We employ some improved techniques to reduce the running time of our extension on policy iteration. We experimentally show that our proposed techniques outperform available sound value iteration techniques on the main part of standard case study models.
Springer Science and Business Media LLC
Title: Sound Policy Iteration
Description:
Abstract Reachability probabilities and expected rewards are two main classes of properties that are needed in the fields of reinforcement learning and formal verification.
Value iteration and policy iteration are two iterative numerical method to compute the underlying properties.
One of their drawbacks is that they only provide lower bounds for the computed values.
To cover this challenge, some sound variations of value iteration have been proposed that also use upper bounds for their computed values and provide a guarantee for their soundness.
In this paper, we focus on policy iteration and explain how this technique can be extended to provide sound values.
For maximal expected reward, we use policy iteration to update lower bound values and apply action elimination for updating upper bounds.
For minimal expected rewards, we apply policy iteration for upper bounds and action elimination for lower bounds.
We employ some improved techniques to reduce the running time of our extension on policy iteration.
We experimentally show that our proposed techniques outperform available sound value iteration techniques on the main part of standard case study models.

Related Results

Modeling methods for dispersive sound speed profiles of the Martian atmosphere and their effects on sound propagation paths
Modeling methods for dispersive sound speed profiles of the Martian atmosphere and their effects on sound propagation paths
At present, Mars acoustic detection is gradually becoming an important new tool for the knowledge and exploration of Mars. To explore the sources of Mars sound, it is necessary to ...
The Role of Static Pressure and Temperature in Building Acoustics
The Role of Static Pressure and Temperature in Building Acoustics
The influence of static pressure and temperature on sound reduction indices, impact sound pressure levels, improvements of impact sound pressure levels and sound reduction indices,...
European Economic Integration
European Economic Integration
This book investigates the evolution of the integration process of the European Union (EU) under the lenses of economic development. The process of the European Economic Integratio...
A recognition method research based on the heart sound texture map
A recognition method research based on the heart sound texture map
In order to improve the Heart Sound recognition rate and reduce the recognition time, in this paper, we introduces a new method for Heart Sound pattern recognition by using Heart S...
Design and Performance Analysis of Sound Source Localization using Time Difference of Arrival Estimation
Design and Performance Analysis of Sound Source Localization using Time Difference of Arrival Estimation
Sound source localization (SSL) is a process of processing sound signals received from sound sensors and locating the sound origin. In many applications, precise localization of th...
ACKNOWLEDGMENTS
ACKNOWLEDGMENTS
The UP Manila Health Policy Development Hub recognizes the invaluable contribution of the participants in theseries of roundtable discussions listed below: RTD: Beyond Hospit...
Research on Shift Sound Quality Control Strategy for Active Sound Generation System in Automobile
Research on Shift Sound Quality Control Strategy for Active Sound Generation System in Automobile
<div class="section abstract"><div class="htmlview paragraph">The active sound synthesis system of electric vehicles plays an important role in improving the sound perc...
The Search for Resonance
The Search for Resonance
ABSTRACT V.1 - "THE SEARCH FOR RESONANCE" - OCTOBER 2018 «The Search for Resonance» by Per Martinsen, Research Fellow, UiT The Arctic University of Norway, Faculty of Fine Arts I...

Back to Top