Javascript must be enabled to continue!

MOSAIC for Multiple-Reward Environments

Reinforcement learning (RL) can provide a basic framework for autonomous robots to learn to control and maximize future cumulative rewards in complex environments. To achieve high performance, RL controllers must consider the complex external dynamics for movements and task (reward function) and optimize control commands. For example, a robot playing tennis and squash needs to cope with the different dynamics of a tennis or squash racket and such dynamic environmental factors as the wind. In addition, this robot has to tailor its tactics simultaneously under the rules of either game. This double complexity of the external dynamics and reward function sometimes becomes more complex when both the multiple dynamics and multiple reward functions switch implicitly, as in the situation of a real (multi-agent) game of tennis where one player cannot observe the intention of her opponents or her partner. The robot must consider its opponent's and its partner's unobservable behavioral goals (reward function). In this article, we address how an RL agent should be designed to handle such double complexity of dynamics and reward. We have previously proposed modular selection and identification for control (MOSAIC) to cope with nonstationary dynamics where appropriate controllers are selected and learned among many candidates based on the error of its paired dynamics predictor: the forward model. Here we extend this framework for RL and propose MOSAIC-MR architecture. It resembles MOSAIC in spirit and selects and learns an appropriate RL controller based on the RL controller's TD error using the errors of the dynamics (the forward model) and the reward predictors. Furthermore, unlike other MOSAIC variants for RL, RL controllers are not a priori paired with the fixed predictors of dynamics and rewards. The simulation results demonstrate that MOSAIC-MR outperforms other counterparts because of this flexible association ability among RL controllers, forward models, and reward predictors.

MIT Press - Journals

Norikazu Sugimoto Masahiko Haruno Kenji Doya Mitsuo Kawato

Neural Computation

2011

Title: MOSAIC for Multiple-Reward Environments

Description:

Reinforcement learning (RL) can provide a basic framework for autonomous robots to learn to control and maximize future cumulative rewards in complex environments.

To achieve high performance, RL controllers must consider the complex external dynamics for movements and task (reward function) and optimize control commands.

For example, a robot playing tennis and squash needs to cope with the different dynamics of a tennis or squash racket and such dynamic environmental factors as the wind.

In addition, this robot has to tailor its tactics simultaneously under the rules of either game.

This double complexity of the external dynamics and reward function sometimes becomes more complex when both the multiple dynamics and multiple reward functions switch implicitly, as in the situation of a real (multi-agent) game of tennis where one player cannot observe the intention of her opponents or her partner.

The robot must consider its opponent's and its partner's unobservable behavioral goals (reward function).

In this article, we address how an RL agent should be designed to handle such double complexity of dynamics and reward.

We have previously proposed modular selection and identification for control (MOSAIC) to cope with nonstationary dynamics where appropriate controllers are selected and learned among many candidates based on the error of its paired dynamics predictor: the forward model.

Here we extend this framework for RL and propose MOSAIC-MR architecture.

It resembles MOSAIC in spirit and selects and learns an appropriate RL controller based on the RL controller's TD error using the errors of the dynamics (the forward model) and the reward predictors.

Furthermore, unlike other MOSAIC variants for RL, RL controllers are not a priori paired with the fixed predictors of dynamics and rewards.

The simulation results demonstrate that MOSAIC-MR outperforms other counterparts because of this flexible association ability among RL controllers, forward models, and reward predictors.

Back

Seascapes, both as specific ecosystems and as cultural manifestations formed through human action, are important in shaping economic and social relations and entail a range of exp...

The Modernization of Mosaic Art in Turkey

IIn Turkey, improvement of the mosaic art started in the Republic period with the help of far-sighted Atatürk who has attached great importance to history and art. The first excava...

A BUILDING WITH MOSAIC IN THE PATARA HARBOR STREET

This article is about a mosaic uncovered during the excavations in chamber II on the westportico of the Harbor Street of Patara that connects the city center to the harbor. The mos...

Trakya’daki Philippopolis ve Augusta Traiana’dan Geç 4. - Erken 5. Yüzyıla Tarihlenen Mozaik Döşemeler

The present paper deals with the mosaic pavements that embellished the public buildings, semi-public and private houses between the 80s of 4th c. and the first two decades of 5th c...

Mo.Se.: Mosaic image segmentation based on deep cascading learning

<div class="page" title="Page 1"><div class="layoutArea"><div class="column"><p class="VARAbstract">Mosaic is an ancient type of art used to create decorati...

Sinop Balatlar Excavation Pebble Mosaic Pavement

During the 2022 Sinop Balatlar Excavation, a pebble stone mosaic floor was unearthed. The design of the mosaic, which adorns the andron room of a private residence, is U-shaped and...

The Mosaic Inscriptions of the Cibyran Odeion

The aim of this article is to introduce three mosaic inscriptions and an inscription fragment of a pedestal, unearthed during the excavations carried out in the stoa in front of th...

Greko-Romen Mozaiklerinde Lotus Çiçeği veya Nelumbo Nucifera

Numerous mosaics from the Hellenistic and imperial periods with Nilotic decoration have been recorded, both in the West and in the East. Almost all of them have a vegetal decoratio...

Recent Results

Stumme Zeugen des Sprachgebrauchs: Friedhöfe und Volkszählungen als Indikatoren der Entwicklung des Slowenischen in Kärnten/Koroška

Summary The southern part of the Austrian state of Carinthia has a long history of language contact between Slovene and German. Since the beginning of the 19th centu...

How Useful Is Image-Based Active Learning for Plant Organ Segmentation?

Training deep learning models typically requires a huge amount of labeled data which is expensive to acquire, especially in dense prediction tasks such as semantic segmentation. Mo...

Email:
Password:

Email:

MOSAIC for Multiple-Reward Environments

Related Results

Recent Results