Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Multi-armed Bandit Algorithms for Cournot Games

View through CrossRef
Abstract We investigate using a multi-armed bandit (MAB) setting for modeling repeated Cournot oligopoly games. Agents interact with separate bandit problems. An agent can choose from a set of arms/actions representing discrete production quantities; here, the action space is ordered. Agents are independent and autonomous and cannot observe anything from the environment; they can only see their own rewards after taking action and only work towards maximizing these rewards. We first study Cournot models with stationary market demand where random entry or exit from the market is not allowed. We propose two novel approaches that take advantage of the fact that the action space is ordered: ϵ-greedy+HL and ϵ-greedy+EL. These are based on the ϵ-greedy approach as an underlying mechanism because the ϵ-greedy method does not require any knowledge of even the priors of the reward distributions, unlike other popular methods like UCB or Thompson sampling. Our proposed approaches help firms focus on more profitable actions by eliminating less profitable choices and are designed to optimize the exploration. However, in real-world scenarios, market demands evolve over a product’s lifetime for a myriad of reasons. Therefore, we also investigate repeated Cournot games with non-stationary demand such that firms/agents face independent instances of the non-stationary multi-armed bandit problem. We propose a novel algorithm Adaptive with Weighted Exploration (AWE) ϵ-greedy that is loosely based on the ϵ-greedy approach. We use computer simulations to study the emergence of various equilibria in the outcomes and empirically analyze joint cumulative regrets. Using our proposed method, agents are able to swiftly change their course of action according to the changes in demand. In most of the simulations, firms overall produce collusive outcomes, i.e., outcomes better than the Nash equilibrium.
Title: Multi-armed Bandit Algorithms for Cournot Games
Description:
Abstract We investigate using a multi-armed bandit (MAB) setting for modeling repeated Cournot oligopoly games.
Agents interact with separate bandit problems.
An agent can choose from a set of arms/actions representing discrete production quantities; here, the action space is ordered.
Agents are independent and autonomous and cannot observe anything from the environment; they can only see their own rewards after taking action and only work towards maximizing these rewards.
We first study Cournot models with stationary market demand where random entry or exit from the market is not allowed.
We propose two novel approaches that take advantage of the fact that the action space is ordered: ϵ-greedy+HL and ϵ-greedy+EL.
These are based on the ϵ-greedy approach as an underlying mechanism because the ϵ-greedy method does not require any knowledge of even the priors of the reward distributions, unlike other popular methods like UCB or Thompson sampling.
Our proposed approaches help firms focus on more profitable actions by eliminating less profitable choices and are designed to optimize the exploration.
However, in real-world scenarios, market demands evolve over a product’s lifetime for a myriad of reasons.
Therefore, we also investigate repeated Cournot games with non-stationary demand such that firms/agents face independent instances of the non-stationary multi-armed bandit problem.
We propose a novel algorithm Adaptive with Weighted Exploration (AWE) ϵ-greedy that is loosely based on the ϵ-greedy approach.
We use computer simulations to study the emergence of various equilibria in the outcomes and empirically analyze joint cumulative regrets.
Using our proposed method, agents are able to swiftly change their course of action according to the changes in demand.
In most of the simulations, firms overall produce collusive outcomes, i.
e.
, outcomes better than the Nash equilibrium.

Related Results

DBA: Dynamic Multi-Armed Bandit Algorithm
DBA: Dynamic Multi-Armed Bandit Algorithm
We introduce Dynamic Bandit Algorithm (DBA), a practical solution to improve the shortcoming of the pervasively employed reinforcement learning algorithm called Multi-Arm Bandit, a...
Schule und Spiel – mehr als reine Wissensvermittlung
Schule und Spiel – mehr als reine Wissensvermittlung
Die öffentliche Schule Quest to learn in New York City ist eine Modell-Schule, die in ihren Lehrmethoden auf spielbasiertes Lernen, Game Design und den Game Design Prozess setzt. I...
Playing Pregnancy: The Ludification and Gamification of Expectant Motherhood in Smartphone Apps
Playing Pregnancy: The Ludification and Gamification of Expectant Motherhood in Smartphone Apps
IntroductionLike other forms of embodiment, pregnancy has increasingly become subject to representation and interpretation via digital technologies. Pregnancy and the unborn entity...
Serious games for environmental education
Serious games for environmental education
AbstractSerious games are increasingly popular in multiple fields, including education and environmental engagement. We conducted a systematic review to examine the reasons for thi...
Multi-armed bandit games
Multi-armed bandit games
AbstractA sequential optimization model, known as the multi-armed bandit problem, is concerned with optimal allocation of resources between competing activities, in order to genera...
ARMED EXTORTION IN LIGHT OF THE PRINCIPLE OF CRIMINAL LEGALITY
ARMED EXTORTION IN LIGHT OF THE PRINCIPLE OF CRIMINAL LEGALITY
Furthermore, the DRC's military courts and tribunals fail to respect the principle of legality of offenses and penalties, in that they conflate the offense of armed robbery with th...
Ethnography in Play: Didactic Games of Russian Germans
Ethnography in Play: Didactic Games of Russian Germans
This article presents the case of creating educational games with linguistic, ethnic and cultural components. Games are viewed as a means of conveying important cultural informatio...
Federated Bandit: A Gossiping Approach
Federated Bandit: A Gossiping Approach
We study Federated Bandit, a decentralized Multi-Armed Bandit (MAB) problem with a set of N agents, who can only communicate their local data with neighbors described by a connecte...

Back to Top