Screen shot 2017 11 09 at 07.56.11

WWW 2018 Challenge: Learning to Recognize Musical Genre

Learning to Recognize Musical Genre from Audio on the Web

Ending
49 days left
35
Submissions
22
Participants
1471
Views

Overview

Like never before, the web has become a place for sharing creative work - such as music - among a global community of artists and art lovers. While music and music collections predate the web, the web enabled much larger scale collections. Whereas people used to own a handful of vinyls or CDs, they nowadays have instant access to the whole of published musical content via online platforms such as Spotify, iTunes, Free Music Archive (FMA), Jamendo, Bandcamp, etc. Such dramatic increase in the size of music collections created two challenges: (i) the need to automatically organize a collection (as users and publishers cannot manage them manually anymore), and (ii) the need to automatically recommend new songs to a user knowing his listening habits. An underlying task in both those challenges is to be able to group songs in semantic categories.

Music genres are categories that have arisen through a complex interplay of cultures, artists, and market forces to characterize similarities between compositions and organize music collections. Yet, the boundaries between genres still remain fuzzy, making the problem of music genre recognition (MGR) a nontrivial task (Scaringella 2006). While its utility has been debated, mostly because of its ambiguity and cultural definition, it is widely used and understood by end-users who find it useful to discuss musical categories (McKay 2006). As such, it is one of the most researched areas in the Music Information Retrieval (MIR) field (Sturm 2012).

The task of this challenge, one of the four official challenges of the Web Conference (WWW2018) challenges track is to recognize the musical genre of a piece of music of which only a recording is available. Genres are broad, e.g. pop or rock, and each song only has one target genre. The data for this challenge is the recently published FMA dataset (Defferrard 2017), which is a dump of the Free Music Archive (FMA), an interactive library of high-quality and curated audio which is freely and openly available to the public. The training set (which is the medium subset of the FMA dataset) is composed of 25,000 clips of 30 seconds, categorized in 16 genres. The categorization is unbalanced with 21 to 7,103 clips per genre, but only one genre per track. Please look at the GitHub repository and the ISMIR’17 paper for more information about the data.

Evaluation

The challenge will happen in two rounds. In the first round, the participants will be provided a test set of 35,000 clips of at most 30 seconds each, and they have to submit their predictions for all the 35,000 clips. In the second round, the participants will be expected to wrap their models in a Docker container, and the docker container will have to predict a new unseen test of clips which are at most 30 seconds each. These clips will be sampled from all the new contributions to the Free Music Archive between January 12 and February 14. Detailed instructions, along with a sample submission docker container for round-2 will be released in a few weeks.

The primary metric for evaluation will be the Mean Log Loss, and the secondary metric for the evaluation with be the Mean F1-Score.

The Mean Log Loss is defined by

$ L = - \frac{1}{N} \sum_{n=1}^N \sum_{c=1}^{C} y_{nc} \ln(p_{nc}), $

where

  • $N=35000$ is the number of examples in the test set,
  • $C=16$ is the number of class labels, i.e. genres,
  • $y_{nc}$ is a binary value indicating if the n-th instance belongs to the c-th label,
  • $p_{nc}$ is the probability according to your submission that the n-th instance belongs to the c-th label,
  • $\ln$ is the natural logarithmic function.

The $F_1$ score for a particular class $c$ is given by

$ F_1^c = 2\frac{p^c r^c}{p^c + r^c}, $

where

  • $p^c = \frac{tp^c}{tp^c + fp^c}$ is the precision for class $c$,
  • $r^c = \frac{tp^c}{tp^c + fn^c}$ is the recall for class $c$,
  • $tp^c$ refers to the number of True Positives for class $c$,
  • $fp^c$ refers to the number of False Positives for class $c$,
  • $fn^c$ refers to the number of False Negatives for class $c$.

The final Mean $F_1$ Score is then defined as

$ F_1 = \frac{1}{C} \sum_{c=1}^{C} F_1^c. $

The participants have to submit a CSV file with the following header:

file_id,Blues,Classical,Country,Easy Listening,Electronic,Experimental,Folk,Hip-Hop,Instrumental,International,Jazz,Old-Time / Historic,Pop,Rock,Soul-RnB,Spoken

Each row is then an entry for every file in the test set (in the sorted order of the file_ids). The first column in every row represents the file_id (which is the name of the test file without its .mp3 extension) and the rest of the $C=16$ columns are the predicted probabilities for each class in the order mentioned in the above CSV header.

Rules

The following rules have to be observed by all participants:

  • Participants are allowed at most 5 submissions per day.
  • Participants have to release their solution under an Open Source License of their choice to be eligible for prizes. We encourage all participants to open-source their code!
  • While submissions by Admins and Organizers can serve as baselines, they won’t be considered in the final leaderboard.
  • Training must be done with the audio in the FMA medium subset only. In particular, the large and full subsets must not be used. Neither should fma_features.csv (from fma_metadata.zip), which was computed on the full set.
  • Metadata, e.g. the song title or artist name, cannot be used for the prediction. The submitted algorithms shall learn to map an audio signal, i.e. a time series, to one of the 16 target genres. In particular, no information (audio features or metadata) from external websites or APIs can be used. This will be enforced in the second round, when the submitted systems will only have access to a set of mp3s to make predictions.
  • In case of conflicts, the decision of the Organizers will be final and binding.
  • Organizers reserve the right to make changes to the rules.

Prizes

The winner will be invited to present their solution to the 3rd Applied Machine Learning Days at EPFL in Switzerland in January 2019, with travel and accommodation covered (up to $2000).

Moreover, all participants are invited to submit a paper to the Web Conference (previously WWW) challenges track. The paper should describe the proposed solution and self-assessments related to the defined criteria for evaluation. These papers will be peer-reviewed and published in the official satellite proceedings of the conference. The challenge papers submission deadline is 12 January 2018. More information on the submission will follow.

Resources

The starter kit includes code to handle the data an make a submission. Moreover, it features some examples and baselines.

You are encouraged to check out the FMA dataset GitHub repository for Jupyter notebooks showing how to use the data, exploring it, and training baseline models. This challenge uses the rc1 version of the data, make sure to checkout that version of the code. The associated paper describes the data.

Additional resources:

Public contact channels:

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at: