Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

crowdAI is shutting down - please read our blog post for more information

AI-generated music challenge

Music generated by AI



Can AI generate music that we humans find beautiful, perhaps even moving? Let’s find out!

In this challenge, participants are tasked to generate an AI model that learns on a large data set of music (in the form of MIDI files), and is then capable of producing its own music. Concretely, the model must produce a music piece in response to a short “seed” MIDI file that is given as input.

There are two special aspects of this challenge, apart from the extremely interesting application. First, the results of the models will be evaluated by humans, with an ELO-style system where volunteers are given two randomly paired pieces of generated music, and choose the one they like better. Second, the top five models will at the end each generate a piece of music that will be performed live on stage at the Applied Machine Learning Days!


The grader expects a MIDI file of a total length of 3600 seconds (when played at 120 bpm). The MIDI file has to be a type 0 MIDI file (a maximum of 1 track), and in the case of multiple tracks, only the first track will be considered. There are no challenge-specific restrictions on the number of channels being used in the MIDI file. The grader splits the MIDI file into 120 chunks of approximately 30 seconds each, and each submission is represented by this pool of 120 chunks. During this post-processing step, all meta events from the MIDI file will be removed except the PPQ meta event (or ticks per beat), hence the officially supported MIDI events will only be the note_on and note_off events; where note_off event can be optionally replaced by a note_on event with a velocity of 0. All the MIDI parsing is done using the MIDO library; and you are requested to ensure that your submitted file is estimated to be of 3600 +/- 10 seconds by mido.MidiFile('your_file_path').length.

A separate evaluation interface is made available, where all the participants (and other external volunteers) can hear two randomly sampled chunks and then vote for the one they like better (more details on the sampling mechanism is provided in the following sections). These randomly sampled chunks will be played with the SoundFont of an acoustic grand piano at 120 bpm.

These binary comparisons will be used to compute an individual score for every submission, which evolves over time as it gets more and more evaluations in the evaluation interface. The scoring mechanism follows the TrueSkill ranking system, and hence is modeled by $ \mu $ (a quantitative estimate of the preference of a general population towards a particular song) and $ \sigma $ (the confidence of the system in this estimate). The actual score on the leaderboard is computed by taking a conservative estimate of the modeled score, and hence is represented by : $ \mu - k * \sigma $ where $k$ is the ratio of the default $\mu$ and $\sigma$ values and is represented by: $(\mu=25) / (\sigma=8.334)$.

The submissions tab will report the values for $\mu$, $\sigma$ and the number of evaluations completed for every submission; and the leaderboard will use the conservative estimate of $\mu - k * \sigma$ as the primary score, and $\mu$ as the secondary score.

To ensure that the top-10 selected participants are not overfitting on the training set; the top-10 submissions at the end of the challenge, will be divided into quantized chunks of $\tau(=5)$ seconds each (at 120 bpm) with a sliding window of stride $s$, and a normalised dynamic time warp (DTW) distance will be computed against $\tau(=5)$ second chunks from all the MIDI files listed in the Datasets. With $DTW(x, y)$ representing the DTW between two $\tau(=5)$ second quantized chunks, the normalized DTW will be computed by :

$NDTW(x,y) = \frac{127 \times T(\tau=5) - DTW(x,y) }{127 \times T(\tau=5)}$

where, $T(\tau)$ represents the number of ticks in a time period of $\tau$ seconds.

All matching chunks pairs with $NTDW < 0.3$ will be manually verified, and in case the chunks are found to be similar, then the submissions will be disqualified. Given the subjective nature of the evaluation, the organisers will reserve the right to both adjust the threshold of $0.3$ and also to decide if the flagged chunks are indeed similar because of the model overfitting, or because of the said participant trying to cheat by stitching together MIDI snippets from the training data.

Starter Kit : A starter kit to help you get started on the submission procedure is made available at: https://github.com/crowdAI/crowdai-ai-generate-music-starter-kit.

Comin Soon : A Getting Started guide on music generation from MIDI files using LSTMs.


  • Participants are allowed at most 2 submissions per day.
  • By uploading a submission, participants provide crowdai the right to host and play short clips of the submitted midi files publicly to human evaluators who may or may not be affiliated with crowdai.
  • Participants are not allowed to make submissions which are hand written, generated using custom rules, or recorded.
  • Participants are expected to release their final code using any Open Source license of their choice to be eligible for the prizes.
  • Organizers reserve the right to make changes to the rules


Starter Kit : A starter kit to help you get started on the submission procedure is made available at : https://github.com/crowdAI/crowdai-ai-generate-music-starter-kit.

Some other projects to help you quickly get started on MIDI composition:


We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :