NIPS '17 Workshop: Criteo Ad Placement Challenge

Counterfactual policy learning for display advertising


Format of the predictions

Posted by ololo 4 months ago

The challenge description says that “Your task is to build a function f which takes M candidates, each represented by a 74000-dimensional vector, and outputs a candidate identifier between 1 and M.”

However when I look at the output of the generated random submission, the output looks like that:


Clearly, it’s not a number between 1 and M. So what is this number? The score, the higher the better?

7.screen shot 2017 02 22 at 12.48.51

Posted by spMohanty  4 months ago |  Quote

Hi @ololo,

Good catch. I had fixed this description in the starter kit. But had missed the challenge overview page. Fixing it right away.

Say, you have M candidates, then you have to print first the impression id, which is “896678244” in the line you posted. Which is followed by a “;” and which is followed by blocks of “candidate_index:score” In the example you posted above, you can see it as : 896678244 ; 0:7.07899504322, 1:5.19543172446, 2:4.06505845612, 3:2.09058551442, 4:3.3119464864, 5:5.82059230136, 6:9.46619061675, 7:8.1351206121, 8:4.81271759758, 9:0.676237539483, 10:7.69203903033

So there are 11 candidates for this impression, and for each line for the candidate, the first number mentioned is the candidate index, which is simply a zero ordered index of thr candidates as they appear in the test set. And the second number for ever candidate is the score that you predict. This score can be an arbitrary positive number; and during the evaluation, we take a softmax over all the scores to determine the probability of choosing a particular candidate.