Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Reward Bases: Instantaneous reward revaluation with temporal difference learning

View through CrossRef
AbstractAn influential theory posits that dopaminergic neurons in the mid-brain implement a model-free reinforcement learning algorithm based on temporal difference (TD) learning. A fundamental assumption of this model is that the reward function being optimized is fixed. However, for biological creatures the ‘reward function’ can fluctuate substantially over time depending on the internal physiological state of the animal. For instance, food is rewarding when you are hungry, but not when you are satiated. While a variety of experiments have demonstrated that animals can instantly adapt their behaviour when their internal physiological state changes, under current thinking this requires model-based planning since the standard model of TD learning requires retraining from scratch if the reward function changes. Here, we propose a novel and simple extension to TD learning that allows for the zero-shot (instantaneous) generalization to changing reward functions. Mathematically, we show that if we assume the reward function is a linear combination ofreward basis vectors, and if we learn a value function for each reward basis using TD learning, then we can recover the true value function by a linear combination of these value function bases. This representational scheme allows instant and perfect generalization to any reward function in the span of the reward basis vectors as well as possesses a straightforward implementation in neural circuitry by parallelizing the standard circuitry required for TD learning. We demonstrate that our algorithm can also reproduce behavioural data on reward revaluation tasks, predict dopamine responses in the nucleus accumbens, as well as learn equally fast as successor representations while requiring much less memory.
Cold Spring Harbor Laboratory
Title: Reward Bases: Instantaneous reward revaluation with temporal difference learning
Description:
AbstractAn influential theory posits that dopaminergic neurons in the mid-brain implement a model-free reinforcement learning algorithm based on temporal difference (TD) learning.
A fundamental assumption of this model is that the reward function being optimized is fixed.
However, for biological creatures the ‘reward function’ can fluctuate substantially over time depending on the internal physiological state of the animal.
For instance, food is rewarding when you are hungry, but not when you are satiated.
While a variety of experiments have demonstrated that animals can instantly adapt their behaviour when their internal physiological state changes, under current thinking this requires model-based planning since the standard model of TD learning requires retraining from scratch if the reward function changes.
Here, we propose a novel and simple extension to TD learning that allows for the zero-shot (instantaneous) generalization to changing reward functions.
Mathematically, we show that if we assume the reward function is a linear combination ofreward basis vectors, and if we learn a value function for each reward basis using TD learning, then we can recover the true value function by a linear combination of these value function bases.
This representational scheme allows instant and perfect generalization to any reward function in the span of the reward basis vectors as well as possesses a straightforward implementation in neural circuitry by parallelizing the standard circuitry required for TD learning.
We demonstrate that our algorithm can also reproduce behavioural data on reward revaluation tasks, predict dopamine responses in the nucleus accumbens, as well as learn equally fast as successor representations while requiring much less memory.

Related Results

Reward does not facilitate visual perceptual learning until sleep occurs
Reward does not facilitate visual perceptual learning until sleep occurs
ABSTRACTA growing body of evidence indicates that visual perceptual learning (VPL) is enhanced by reward provided during training. Another line of studies has shown that sleep foll...
Examining the effects of reward and punishment on incidental learning
Examining the effects of reward and punishment on incidental learning
<p>Reward has been shown to improve multiple forms of learning. However, many of these studies do not distinguish whether reward directly benefits learning or if learning is ...
Role of the Frontal Lobes in the Propagation of Mesial Temporal Lobe Seizures
Role of the Frontal Lobes in the Propagation of Mesial Temporal Lobe Seizures
Summary: The depth ictal electroencephalographic (EEG) propagation sequence accompanying 78 complex partial seizures of mesial temporal origin was reviewed in 24 patients (15 from...
Seismic attribute benchmarking on instantaneous frequency
Seismic attribute benchmarking on instantaneous frequency
The complex seismic trace analysis is a widely applied and versatile method for computing seismic attributes. Instantaneous frequency is an important complex trace attribute, and i...
A Neuroimaging Study of the Effort-Reward Imbalance Framework for Cognitive Fatigue in Individuals with Multiple Sclerosis
A Neuroimaging Study of the Effort-Reward Imbalance Framework for Cognitive Fatigue in Individuals with Multiple Sclerosis
Background: Cognitive fatigue is one of the most pervasive yet least understood symptoms in persons with multiple sclerosis (PwMS). The current study examined whether the effort-re...
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic&nbsp;
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic&nbsp;
Abstract Background: To minimize the risk of infection during the COVID-19 pandemic, the learning mode of universities in China has been adjusted, and the online learning o...
Pleasure, reward value and prediction error in anhedonia
Pleasure, reward value and prediction error in anhedonia
In order to develop effective treatments for anhedonia we need to understand its underlying neurobiological mechanisms. Anhedonia is conceptually strongly linked to reward processi...
Reward processing deficits: weakened self-reward association in individuals with methamphetamine addiction undergoing abstinence
Reward processing deficits: weakened self-reward association in individuals with methamphetamine addiction undergoing abstinence
This research primarily investigates whether both reward processing and self-processing are aberrant in individuals with methamphetamine use disorder. It also explores whether init...

Back to Top