Cookies help us deliver our services. By using our services, you agree to our use of cookies.

AI-generated music challenge

Music generated by AI

Completed
111
Submissions
182
Participants
18735
Views

How is the competition going to guard against overfitting?

Posted by eatnow over 1 year ago

A model which simply memorizes snippets of its dataset will do very well against human evaluators (who might not be familiar with most songs in the dataset)

Posted by Marcin_Mucha  over 1 year ago |  Quote

Most likely the initial MIDI sample is supposed to force the generator into a certain area in the solution space, but if the solution is then to be cut into 120 pieces, then it really doesn’t matter and memorization of the dataset should work great.

And even if, somehow, one can come up with a tune better than the dataset, there really is no incentive to not always return the same tune (you will only get a poor score in one chunk).

Posted by spMohanty  over 1 year ago |  Quote

Hi @eatnow,

Good point, I have added a section explaining one suggested approach to ensure the same :  To ensure that the top-10 selected participants are not overfitting on the training set; the top-10 submissions at the end of the challenge, will be divided into quantized chunks of $\tau(=5)$ seconds each (at 120 bpm) with a sliding window of stride $s$, and a normalised dynamic time warp (DTW) distance will be computed against $\tau(=5)$ second chunks from all the MIDI files listed in the Datasets section. With $DTW(x, y)$ representing the DTW between two $\tau(=5)$ second quantized chunks, the normalized DTW will be computed by : $NDTW(x,y) = \frac{127 \times T(\tau=5) - DTW(x,y) }{127 \times T(\tau=5)}$

where, $T(\tau)$ represents the number of ticks in a time period of $\tau$ seconds.

All matching chunks pairs with $NTDW < 0.3$ will be manually verified, and in case the chunks are found to be similar, then the submissions will be disqualified. Given the subjective nature of the evaluation, the organisers will reserve the right to both adjust the threshold of $0.3$ and also to decide if the flagged chunks are indeed similar because of the model overfitting, or because of the said participant trying to cheat by stitching together MIDI snippets from the training data. 

If you guys have any suggestions, we would be happy to incorporate the same.

Cheers, Mohanty

Posted by eatnow  over 1 year ago |  Quote

Sounds as good as we can wish for… Marcin raises a valid point about how we are cutting up the solution though. If someone ignores the seed and submits a bunch of memorized songs (overfitted to outside suggested datasets) then most human graders would not even know that the generated song has nothing to do with the seed. Suggestion is therefore to make the seed available in the grading interface…