Javascript must be enabled to continue!
An Open-Ended Learning Framework for Opponent Modeling
View through CrossRef
Opponent Modeling (OM) aims to enhance decision-making by modeling other agents in multi-agent environments. Existing works typically learn opponent models against a pre-designated fixed set of opponents during training. However, this will cause poor generalization when facing unknown opponents during testing, as previously unseen opponents can exhibit out-of-distribution (OOD) behaviors that the learned opponent models cannot handle. To tackle this problem, we introduce a novel Open-Ended Opponent Modeling (OEOM) framework, which continuously generates opponents with diverse strengths and styles to reduce the possibility of OOD situations occurring during testing. Founded on population-based training and information-theoretic trajectory space diversity regularization, OEOM generates a dynamic set of opponents. This set is then fed to any OM approaches to train a potentially generalizable opponent model. Upon this, we further propose a simple yet effective OM approach that naturally fits within the OEOM framework. This approach is based on in-context reinforcement learning and learns a Transformer that dynamically recognizes and responds to opponents based on their trajectories. Extensive experiments in cooperative, competitive, and mixed environments demonstrate that OEOM is an approach-agnostic framework that improves generalizability compared to training against a fixed set of opponents, regardless of OM approaches or testing opponent settings. The results also indicate that our proposed approach generally outperforms existing OM baselines.
Association for the Advancement of Artificial Intelligence (AAAI)
Title: An Open-Ended Learning Framework for Opponent Modeling
Description:
Opponent Modeling (OM) aims to enhance decision-making by modeling other agents in multi-agent environments.
Existing works typically learn opponent models against a pre-designated fixed set of opponents during training.
However, this will cause poor generalization when facing unknown opponents during testing, as previously unseen opponents can exhibit out-of-distribution (OOD) behaviors that the learned opponent models cannot handle.
To tackle this problem, we introduce a novel Open-Ended Opponent Modeling (OEOM) framework, which continuously generates opponents with diverse strengths and styles to reduce the possibility of OOD situations occurring during testing.
Founded on population-based training and information-theoretic trajectory space diversity regularization, OEOM generates a dynamic set of opponents.
This set is then fed to any OM approaches to train a potentially generalizable opponent model.
Upon this, we further propose a simple yet effective OM approach that naturally fits within the OEOM framework.
This approach is based on in-context reinforcement learning and learns a Transformer that dynamically recognizes and responds to opponents based on their trajectories.
Extensive experiments in cooperative, competitive, and mixed environments demonstrate that OEOM is an approach-agnostic framework that improves generalizability compared to training against a fixed set of opponents, regardless of OM approaches or testing opponent settings.
The results also indicate that our proposed approach generally outperforms existing OM baselines.
Related Results
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
PERBEDAAN MINAT BELAJAR MATEMATIKA SISWA DENGAN MENGGUNAKAN MODEL PEMBELAJARAN MATEMATIKA REALISTIK DAN MODEL PEMBELAJARAN OPEN ENDED
PERBEDAAN MINAT BELAJAR MATEMATIKA SISWA DENGAN MENGGUNAKAN MODEL PEMBELAJARAN MATEMATIKA REALISTIK DAN MODEL PEMBELAJARAN OPEN ENDED
<p class="AfiliasiCxSpFirst" align="left"><strong>Abstrak:</strong></p><p class="AfiliasiCxSpMiddle">Adapun tujuan dari penelitian ini ialah untuk mel...
The L/M-Opponent Channel Provides a Distinct and Time-Dependent Contribution towards Visual Recognition
The L/M-Opponent Channel Provides a Distinct and Time-Dependent Contribution towards Visual Recognition
The visual pathway has been successfully modelled as containing separate channels consisting of one achromatically opponent mechanism and two chromatically opponent mechanisms. How...
Asking Open-Ended Questions Increases Personal Gains in Negotiations
Asking Open-Ended Questions Increases Personal Gains in Negotiations
A vast wisdom literature espouses the power of asking open-ended questions during negotiations. But is this advice effective? We analyzed 61,057 speech turns from transcripts of 30...
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Abstract
The rapid growth of open access publishing (OAP) has significantly improved the accessibility and dissemination of scientific knowledge. However, this expansion has also c...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
A New Standoff-Factor Formula for Orthovoltage Radiotherapy Treatments
A New Standoff-Factor Formula for Orthovoltage Radiotherapy Treatments
Abstract
Orthovoltage x-rays are useful for the treatment of some superficial cancers and benign conditions. An orthovoltage machine has numerous...
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic
Abstract
Background: To minimize the risk of infection during the COVID-19 pandemic, the learning mode of universities in China has been adjusted, and the online learning o...

