Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

MARLÖ 2018

Multi-Agent Reinforcement Learning in Minecraft

Starting soon


The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ ) competition is a new challenge that proposes research on Multi-Agent Reinforcement Learning using multiple games. Participants would create learning agents that will be able to play multiple 3D games as defined in the MalmÖ platform built on top of Minecraft. The aim of the competition is to encourage AI research on more general approaches via multi-player games. For this, the challenge will consist of not one but several games, each one of them with several tasks of varying difficulty and settings. Some of these tasks will be public and participants will be able to train on them. Others, however, will be private, only used to determine the final rankings of the competition.

A framework will be provided with easy instructions to install, create the first agent and submit it to the competition server. Documentation, tutorials and sample controllers for the development of entries will also be accessible to the participants of this challenge. The competition will be hosted on CrowdAI.org, which will determine the preliminary rankings of the competition. Recurring tournaments at regular intervals will determine which agents perform better in the games proposed. This competition will be sponsored by Microsoft Research for framework development and competition awards.

Games and Tasks

One of the main features of this competition is that agents play in multiple games. Therefore, several tasks are proposed for this contest. For the purpose of this document and the competition itself, we define:

  • Game: each one of the different scenarios in which agents play.

  • Task: each instance of a game. Tasks, within a game, may be different to each other in terms of level layout, size, difficulty and other game-dependent settings.
    Figure1.png Figure 1 sketches how games and tasks are organized in the challenge. As can be seen, tasks will be of public nature and accessible by the participants, while others are secret and will be used to evaluate the submitted entries at the end of the competition. Tasks are distributed across sets:

Figure2.png Figure 2: On the left: Build Battle, where players need to recreate a structure (in this case, the structure is shown on the ground). On the right: Pig Chase, where players collaborate to corner the pig.


  1. Protocol
    The participants will be provided an extensive starter kit, with all necessary instructions to download the framework, develop and execute their agents locally for testing in the games included in the benchmark. The starter kit will also include a set of simple tasks, and the code for default challenge agents (to compete against, but also as examples for participants to develop their own entries). Agent code needs to be packaged in a docker container before it is submitted for evaluation and needs to be able to run reliably within a given set of resource constraints. The participants will also be provided the binder tools ecosystem to help them locally test the conversion of their submissions.

  2. Qualifying Round
    Evaluation on the multi-agent tasks is performed against a set of qualifying tasks and qualifying challenger agents. Participants run their agent code on their own machines and interact with the game. Your agent is evaluated by remote grading server. Top 32 evaluated teams are invited to the final round.

  3. Tournament
    Both preliminary (training and validation tasks) or the final (test) rankings are computed by means of a play-off tournament. Each tournament features N games and at least 1 task for each one of them. The agents must show proficiency in multiple games, not only one, to move to the next round of the play-off. This tournament works as follows. Figure 3 shows the main structure of the tournament. All agents are randomly assigned to groups of P players for a first knock-out round. Players on each group among themselves (in which we here called a league) to determine a ranking, and the top proportion (α) of these players progresses to the next round. The rankings of the grand final determine the winner and runner-ups of the tournament.

Figure3.png Figure 3: MARLÖ Tournament

Evaluation criterion

Each league (P players in a group) is played across the same N games, with T repetitions on the private task of each game.
Each game has its own leaderboard, ranking entries and awarding points: 25 points for the 1st, 18 for the 2nd, 15, 12, 10, 8, 6, 4, 2, 1 and 0 for the 11th onwards.
Winner of each league determined by summing points across all games.



  • July 10th, 2018 : Competition Open
  • October 7th, 2018 : Qualifying Round
  • TBD, November, 2018 : Final Round

Organizing Team

The organizing team comes from multiple groups — Queen Mary University of London, EPFL and Microsoft Research.
The Team consists of:

  • Diego Perez-Liebana
  • Sharada Prasanna Mohanty
  • Katja Hofmann
  • Sam Devlin
  • Sean Kuno


  • Microsoft


  • Queen Mary University of London
  • EPFL


To be announced


The original resource of Project Malmo platform is available on the GitHub.

The participants will be provided an extensive starter kit, with all necessary instructions to download the framework, develop and execute their agents locally for testing in the games included in the benchmark. Stay tuned!

Contact Us

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :