Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Model-free inverse reinforcement learning with multi-intention, unlabeled, and overlapping demonstrations

View through CrossRef
AbstractIn this paper, we define a novel inverse reinforcement learning (IRL) problem where the demonstrations are multi-intention, i.e., collected from multi-intention experts, unlabeled, i.e., without intention labels, and partially overlapping, i.e., shared between multiple intentions. In the presence of overlapping demonstrations, current IRL methods, developed to handle multi-intention and unlabeled demonstrations, cannot successfully learn the underlying reward functions. To solve this limitation, we propose a novel clustering-based approach to disentangle the observed demonstrations and experimentally validate its advantages. Traditional clustering-based approaches to multi-intention IRL, which are developed on the basis of model-based Reinforcement Learning (RL), formulate the problem using parametric density estimation. However, in high-dimensional environments and unknown system dynamics, i.e., model-free RL, the solution of parametric density estimation is only tractable up to the density normalization constant. To solve this, we formulate the problem as a mixture of logistic regressions to directly handle the unnormalized density. To research the challenges faced by overlapping demonstrations, we introduce the concepts of shared pair, which is a state-action pair that is shared in more than one intention, and separability, which resembles how well the multiple intentions can be separated in the joint state-action space. We provide theoretical analyses under the global optimality condition and the existence of shared pairs. Furthermore, we conduct extensive experiments on four simulated robotics tasks, extended to accept different intentions with specific levels of separability, and a synthetic driver task developed to directly control the separability. We evaluate the existing baselines on our defined problem and demonstrate, theoretically and experimentally, the advantages of our clustering-based solution, especially when the separability of the demonstrations decreases.
Title: Model-free inverse reinforcement learning with multi-intention, unlabeled, and overlapping demonstrations
Description:
AbstractIn this paper, we define a novel inverse reinforcement learning (IRL) problem where the demonstrations are multi-intention, i.
e.
, collected from multi-intention experts, unlabeled, i.
e.
, without intention labels, and partially overlapping, i.
e.
, shared between multiple intentions.
In the presence of overlapping demonstrations, current IRL methods, developed to handle multi-intention and unlabeled demonstrations, cannot successfully learn the underlying reward functions.
To solve this limitation, we propose a novel clustering-based approach to disentangle the observed demonstrations and experimentally validate its advantages.
Traditional clustering-based approaches to multi-intention IRL, which are developed on the basis of model-based Reinforcement Learning (RL), formulate the problem using parametric density estimation.
However, in high-dimensional environments and unknown system dynamics, i.
e.
, model-free RL, the solution of parametric density estimation is only tractable up to the density normalization constant.
To solve this, we formulate the problem as a mixture of logistic regressions to directly handle the unnormalized density.
To research the challenges faced by overlapping demonstrations, we introduce the concepts of shared pair, which is a state-action pair that is shared in more than one intention, and separability, which resembles how well the multiple intentions can be separated in the joint state-action space.
We provide theoretical analyses under the global optimality condition and the existence of shared pairs.
Furthermore, we conduct extensive experiments on four simulated robotics tasks, extended to accept different intentions with specific levels of separability, and a synthetic driver task developed to directly control the separability.
We evaluate the existing baselines on our defined problem and demonstrate, theoretically and experimentally, the advantages of our clustering-based solution, especially when the separability of the demonstrations decreases.

Related Results

The Demonstration Society
The Demonstration Society
Today, as in the past, public demonstrations are not only tools to prove, persuade, and promote, but also fundamental forms of social interaction and exchange. YouTu...
The Effect of Compression Reinforcement on the Shear Behavior of Concrete Beams with Hybrid Reinforcement
The Effect of Compression Reinforcement on the Shear Behavior of Concrete Beams with Hybrid Reinforcement
Abstract This study examines the impact of steel compression reinforcement on the shear behavior of concrete beams reinforced with glass fiber reinforced polymer (GFRP) bar...
ASAP-CORPS: A Semi-Autonomous Platform for COntact-Rich Precision Surgery
ASAP-CORPS: A Semi-Autonomous Platform for COntact-Rich Precision Surgery
ABSTRACT Introduction Remote military operations require rapid response times for effective relief and critical care. Yet, the m...
Pengaruh E-WOM terhadap Purchase Intention yang Dimoderasi oleh Perceived Quality
Pengaruh E-WOM terhadap Purchase Intention yang Dimoderasi oleh Perceived Quality
Abstract. This study explores the effect of E-WOM and perceived quality on purchase intention. It aims to analyze E-WOM, perceived quality, and purchase intention, as well as the d...
Study on Scheme Optimization of bridge reinforcement increasing ratio
Study on Scheme Optimization of bridge reinforcement increasing ratio
Abstract The bridge reinforcement methods, each method has its advantages and disadvantages. The load-bearing capacity of bridge members is controlled by the ultimat...
Establishment and Application of the Multi-Peak Forecasting Model
Establishment and Application of the Multi-Peak Forecasting Model
Abstract After the development of the oil field, it is an important task to predict the production and the recoverable reserve opportunely by the production data....
Robust treatment planning for small animal radio‐neuromodulation using focused kV x‐ray beams
Robust treatment planning for small animal radio‐neuromodulation using focused kV x‐ray beams
AbstractBackgroundIn preclinical radio‐neuromodulation research, small animal experiments are pivotal for unraveling radiobiological mechanism, investigating prescription and plann...
Dopamine regulates decision thresholds in human reinforcement learning
Dopamine regulates decision thresholds in human reinforcement learning
AbstractDopamine fundamentally contributes to reinforcement learning by encoding prediction errors, deviations of an outcome from expectation. Prediction error coding in dopaminerg...

Back to Top