Multi-Agent Reinforcement Learning in Minecraft
The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ ) competition is a new challenge that proposes research on Multi-Agent Reinforcement Learning using multiple games. Participants would create learning agents that will be able to play multiple 3D games as deﬁned in the MalmÖ platform built on top of Minecraft. The aim of the competition is to encourage AI research on more general approaches via multi-player games. For this, the challenge will consist of not one but several games, each one of them with several tasks of varying diﬃculty and settings. Some of these tasks will be public and participants will be able to train on them. Others, however, will be private, only used to determine the ﬁnal rankings of the competition.
A framework will be provided with easy instructions to install, create the ﬁrst agent and submit it to the competition server. Documentation, tutorials and sample controllers for the development of entries will also be accessible to the participants of this challenge. The competition will be hosted on CrowdAI.org, which will determine the preliminary rankings of the competition. Recurring tournaments at regular intervals will determine which agents perform better in the games proposed. This competition will be sponsored by Microsoft Research for framework development and competition awards.
Games and Tasks
One of the main features of this competition is that agents play in multiple games. Therefore, several tasks are proposed for this contest. For the purpose of this document and the competition itself, we deﬁne:
Game: each one of the diﬀerent scenarios in which agents play.
Task: each instance of a game. Tasks, within a game, may be diﬀerent to each other in terms of level layout, size, diﬃculty and other game-dependent settings.
Figure 1 sketches how games and tasks are organized in the challenge. As can be seen, tasks will be of public nature and accessible by the participants, while others are secret and will be used to evaluate the submitted entries at the end of the competition. Tasks are distributed across sets:
Figure 2: On the left: Build Battle, where players need to recreate a structure (in this case, the structure is shown on the ground). On the right: Pig Chase, where players collaborate to corner the pig.
The participants will be provided an extensive starter kit, with all necessary instructions to download the framework, develop and execute their agents locally for testing in the games included in the benchmark. The starter kit will also include a set of simple tasks, and the code for default challenge agents (to compete against, but also as examples for participants to develop their own entries). Agent code needs to be packaged in a docker container before it is submitted for evaluation and needs to be able to run reliably within a given set of resource constraints. The participants will also be provided the binder tools ecosystem to help them locally test the conversion of their submissions.
- Qualifying Round
Evaluation on the multi-agent tasks is performed against a set of qualifying tasks and qualifying challenger agents. Participants run their agent code on their own machines and interact with the game. Your agent is evaluated by remote grading server. Top 32 evaluated teams are invited to the final round.
Both preliminary (training and validation tasks) or the ﬁnal (test) rankings are computed by means of a play-oﬀ tournament. Each tournament features N games and at least 1 task for each one of them. The agents must show proﬁciency in multiple games, not only one, to move to the next round of the play-oﬀ. This tournament works as follows. Figure 3 shows the main structure of the tournament. All agents are randomly assigned to groups of P players for a ﬁrst knock-out round. Players on each group among themselves (in which we here called a league) to determine a ranking, and the top proportion (α) of these players progresses to the next round. The rankings of the grand ﬁnal determine the winner and runner-ups of the tournament.
Figure 3: MARLÖ Tournament
Each league (P players in a group) is played across the same N games, with T repetitions on the private task of each game.
Each game has its own leaderboard, ranking entries and awarding points: 25 points for the 1st, 18 for the 2nd, 15, 12, 10, 8, 6, 4, 2, 1 and 0 for the 11th onwards.
Winner of each league determined by summing points across all games.
- July 10th, 2018 : Competition Open
- October 7th, 2018 : Qualifying Round
- TBD, November, 2018 : Final Round
The organizing team comes from multiple groups — Queen Mary University of London, EPFL and Microsoft Research.
The Team consists of:
- Diego Perez-Liebana
- Sharada Prasanna Mohanty
- Katja Hofmann
- Sam Devlin
- Sean Kuno
- Queen Mary University of London
To be announced
The original resource of Project Malmo platform is available on the GitHub.
- GitHub : https://github.com/Microsoft/malmo
The participants will be provided an extensive starter kit, with all necessary instructions to download the framework, develop and execute their agents locally for testing in the games included in the benchmark. Stay tuned!
- Gitter Channel : https://gitter.im/Microsoft/malmo
- Discussion Forum : https://www.crowdai.org/challenges/marlo-2018/topics
We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :