Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Towards an Improved Strategy for Solving Multi-Armed Bandit Problem

View through CrossRef
Multi-Armed Bandit (MAB) problem is one of the classical reinforcements learning problems that describe the friction between the agent’s exploration and exploitation. This study explores metaheuristics as optimization strategies to support Epsilon greedy in achieving an improved reward maximization strategy in MAB. In view of this, Annealing Epsilon greedy is adapted and PSO Epsilon greedy strategy is newly introduced. These two metaheuristics-based MAB strategies are implemented with input parameters, such as number of slot machines, number of iterations, and epsilon values, to investigate the maximized rewards under different conditions. This study found that rewards maximized increase as the number of iterations increase, except in PSO Epsilon Greedy where there is a non-linear behavior. Our Annealing-Epsilon greedy strategy performed better than Epsilon Greedy when the number of slot machines is 10, but Epsilon greedy did better when the number of slot machines is 5. At the optimal value of Epsilon, which we found at 0.06, Annealing Epsilon greedy performed better than Epsilon greedy when the number of iterations is 1000. But at number of iterations ≥ 1000, Epsilon greedy performed better than Annealing Epsilon greedy. A stable reward maximization values are observed for Epsilon greedy strategy within Epsilon values 0.02 and 0.1, and a drastic decline at epsilon > 0.1.
Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP
Title: Towards an Improved Strategy for Solving Multi-Armed Bandit Problem
Description:
Multi-Armed Bandit (MAB) problem is one of the classical reinforcements learning problems that describe the friction between the agent’s exploration and exploitation.
This study explores metaheuristics as optimization strategies to support Epsilon greedy in achieving an improved reward maximization strategy in MAB.
In view of this, Annealing Epsilon greedy is adapted and PSO Epsilon greedy strategy is newly introduced.
These two metaheuristics-based MAB strategies are implemented with input parameters, such as number of slot machines, number of iterations, and epsilon values, to investigate the maximized rewards under different conditions.
This study found that rewards maximized increase as the number of iterations increase, except in PSO Epsilon Greedy where there is a non-linear behavior.
Our Annealing-Epsilon greedy strategy performed better than Epsilon Greedy when the number of slot machines is 10, but Epsilon greedy did better when the number of slot machines is 5.
At the optimal value of Epsilon, which we found at 0.
06, Annealing Epsilon greedy performed better than Epsilon greedy when the number of iterations is 1000.
But at number of iterations ≥ 1000, Epsilon greedy performed better than Annealing Epsilon greedy.
A stable reward maximization values are observed for Epsilon greedy strategy within Epsilon values 0.
02 and 0.
1, and a drastic decline at epsilon > 0.
1.

Related Results

Analisis Kebutuhan Modul Matematika untuk Meningkatkan Kemampuan Pemecahan Masalah Siswa SMP N 4 Batang
Analisis Kebutuhan Modul Matematika untuk Meningkatkan Kemampuan Pemecahan Masalah Siswa SMP N 4 Batang
Pemecahan masalah merupakan suatu usaha untuk menyelesaikan masalah matematika menggunakan pemahaman yang telah dimilikinya. Siswa yang mempunyai kemampuan pemecahan masalah rendah...
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
ARMED EXTORTION IN LIGHT OF THE PRINCIPLE OF CRIMINAL LEGALITY
ARMED EXTORTION IN LIGHT OF THE PRINCIPLE OF CRIMINAL LEGALITY
Furthermore, the DRC's military courts and tribunals fail to respect the principle of legality of offenses and penalties, in that they conflate the offense of armed robbery with th...
AFFORDANCE BASED FRAMEWORK OF HUMAN PROBLEM SOLVING: A NONREPRESENTATIONAL ALTERNATIVE
AFFORDANCE BASED FRAMEWORK OF HUMAN PROBLEM SOLVING: A NONREPRESENTATIONAL ALTERNATIVE
Problem solving is a crucial higher-order thinking ability of humans. Humans’ ability to solve problems is a critical higher-order thinking ability. Mathematical problem solving, a...
Multi-armed bandit games
Multi-armed bandit games
AbstractA sequential optimization model, known as the multi-armed bandit problem, is concerned with optimal allocation of resources between competing activities, in order to genera...
Federated Bandit: A Gossiping Approach
Federated Bandit: A Gossiping Approach
We study Federated Bandit, a decentralized Multi-Armed Bandit (MAB) problem with a set of N agents, who can only communicate their local data with neighbors described by a connecte...
Multi-armed Bandit Algorithms for Cournot Games
Multi-armed Bandit Algorithms for Cournot Games
Abstract We investigate using a multi-armed bandit (MAB) setting for modeling repeated Cournot oligopoly games. Agents interact with separate bandit problems. An agent can ...
ISSUES AND COUNTERMEASURES IN THE PROBLEM-SOLVING TRAINING FOR PRE-SERVICE MATHEMATICS TEACHERS
ISSUES AND COUNTERMEASURES IN THE PROBLEM-SOLVING TRAINING FOR PRE-SERVICE MATHEMATICS TEACHERS
Through testing and interviews, this study investigates the problem-solving training of pre-service mathematics teachers during their university education. The findings indicate th...

Back to Top