Javascript must be enabled to continue!

Multi-armed Bandit Algorithms for Cournot Games

Abstract We investigate using a multi-armed bandit (MAB) setting for modeling repeated Cournot oligopoly games. Agents interact with separate bandit problems. An agent can choose from a set of arms/actions representing discrete production quantities; here, the action space is ordered. Agents are independent and autonomous and cannot observe anything from the environment; they can only see their own rewards after taking action and only work towards maximizing these rewards. We first study Cournot models with stationary market demand where random entry or exit from the market is not allowed. We propose two novel approaches that take advantage of the fact that the action space is ordered: ϵ-greedy+HL and ϵ-greedy+EL. These are based on the ϵ-greedy approach as an underlying mechanism because the ϵ-greedy method does not require any knowledge of even the priors of the reward distributions, unlike other popular methods like UCB or Thompson sampling. Our proposed approaches help firms focus on more profitable actions by eliminating less profitable choices and are designed to optimize the exploration. However, in real-world scenarios, market demands evolve over a product’s lifetime for a myriad of reasons. Therefore, we also investigate repeated Cournot games with non-stationary demand such that firms/agents face independent instances of the non-stationary multi-armed bandit problem. We propose a novel algorithm Adaptive with Weighted Exploration (AWE) ϵ-greedy that is loosely based on the ϵ-greedy approach. We use computer simulations to study the emergence of various equilibria in the outcomes and empirically analyze joint cumulative regrets. Using our proposed method, agents are able to swiftly change their course of action according to the changes in demand. In most of the simulations, firms overall produce collusive outcomes, i.e., outcomes better than the Nash equilibrium.

Springer Science and Business Media LLC

Kshitija Taywade Judy Goldsmith Brent Harrison Adib Bagh

2023

Title: Multi-armed Bandit Algorithms for Cournot Games

Description:

Abstract We investigate using a multi-armed bandit (MAB) setting for modeling repeated Cournot oligopoly games.

Agents interact with separate bandit problems.

An agent can choose from a set of arms/actions representing discrete production quantities; here, the action space is ordered.

Agents are independent and autonomous and cannot observe anything from the environment; they can only see their own rewards after taking action and only work towards maximizing these rewards.

We first study Cournot models with stationary market demand where random entry or exit from the market is not allowed.

We propose two novel approaches that take advantage of the fact that the action space is ordered: ϵ-greedy+HL and ϵ-greedy+EL.

These are based on the ϵ-greedy approach as an underlying mechanism because the ϵ-greedy method does not require any knowledge of even the priors of the reward distributions, unlike other popular methods like UCB or Thompson sampling.

Our proposed approaches help firms focus on more profitable actions by eliminating less profitable choices and are designed to optimize the exploration.

However, in real-world scenarios, market demands evolve over a product’s lifetime for a myriad of reasons.

Therefore, we also investigate repeated Cournot games with non-stationary demand such that firms/agents face independent instances of the non-stationary multi-armed bandit problem.

We propose a novel algorithm Adaptive with Weighted Exploration (AWE) ϵ-greedy that is loosely based on the ϵ-greedy approach.

We use computer simulations to study the emergence of various equilibria in the outcomes and empirically analyze joint cumulative regrets.

Using our proposed method, agents are able to swiftly change their course of action according to the changes in demand.

In most of the simulations, firms overall produce collusive outcomes, i.

, outcomes better than the Nash equilibrium.

Back

We introduce Dynamic Bandit Algorithm (DBA), a practical solution to improve the shortcoming of the pervasively employed reinforcement learning algorithm called Multi-Arm Bandit, a...

Schule und Spiel – mehr als reine Wissensvermittlung

Die öffentliche Schule Quest to learn in New York City ist eine Modell-Schule, die in ihren Lehrmethoden auf spielbasiertes Lernen, Game Design und den Game Design Prozess setzt. I...

Playing Pregnancy: The Ludification and Gamification of Expectant Motherhood in Smartphone Apps

IntroductionLike other forms of embodiment, pregnancy has increasingly become subject to representation and interpretation via digital technologies. Pregnancy and the unborn entity...

Serious games for environmental education

AbstractSerious games are increasingly popular in multiple fields, including education and environmental engagement. We conducted a systematic review to examine the reasons for thi...

Multi-armed bandit games

AbstractA sequential optimization model, known as the multi-armed bandit problem, is concerned with optimal allocation of resources between competing activities, in order to genera...

ARMED EXTORTION IN LIGHT OF THE PRINCIPLE OF CRIMINAL LEGALITY

Furthermore, the DRC's military courts and tribunals fail to respect the principle of legality of offenses and penalties, in that they conflate the offense of armed robbery with th...

Ethnography in Play: Didactic Games of Russian Germans

This article presents the case of creating educational games with linguistic, ethnic and cultural components. Games are viewed as a means of conveying important cultural informatio...

Federated Bandit: A Gossiping Approach

We study Federated Bandit, a decentralized Multi-Armed Bandit (MAB) problem with a set of N agents, who can only communicate their local data with neighbors described by a connecte...

Email:
Password:

Email:

Multi-armed Bandit Algorithms for Cournot Games

Related Results