Multiarmedbandit

NIPS '17 Workshop: Criteo Ad Placement Challenge

Counterfactual policy learning for display advertising

Ending
11 days left
16
Submissions
82
Participants
4067
Views

Overview

Welcome to the Criteo Ad Placement challenge, accompanying the NIPS’17 Workshop on Causal Inference and Machine Learning.

Consider a display advertising scenario: a user arrives at a website where we have an advertisement slot (“an impression”). We have a pool of potential products to advertise during that impression (“a candidate set”). A “policy” chooses which product to display so as to maximize the number of clicks by the users.

The goal of the challenge is to find a good policy that, knowing the candidate set and impression context, chooses products to display such that the aggregate click-through-rate (“CTR”) from deploying the policy is maximized.

To enable you to find a good policy, Criteo has generously donated several gigabytes of <user impression, candidate set, selected product, click/no-click> logs from a randomized version of their in-production policy! Go to the Dataset section of the challenge to access the dataset and visit the GitHub repo to access a starter-kit for counterfactual policy learning with the dataset.

Prizes worth $4500 sponsored by Criteo!

criteo_research_logo.png

This policy learning challenge is subtly different from the 2014 Criteo challenge to compare CTR prediction algorithms. CTR prediction is a standard supervised regression problem: for any ad in the training data, the correct regression target is given (click/no-click). For policy learning, when the in-production policy chose a product that was not clicked, we do not know “what-if” (counterfactual) we chose a different product to display. These differences and some baseline approaches are detailed in the dataset companion paper.

The objective of this challenge is to spur participants to explore several issues on a real-world benchmark dataset, and share their findings during the NIPS’17 workshop, such as:

  • Discover new training objectives, learning algorithms and regularization mechanisms for counterfactual learning scenarios.
  • Find appropriate ways to perform model selection (analogous to cross-validation for supervised learning) using large amounts of logged interaction data.
  • Develop algorithms that can scale to massive datasets (typically orders of magnitude larger than labeled datasets).

Evaluation

In the dataset, each impression is represented by M lines where M is the number of candidate ads. Each line has feature information for every other candidate ad. In addition, the first line corresponds to the candidate that was displayed by the logging policy, an indicator whether the displayed product was clicked by the user (“click” encoded as 0.001, “no-click” encoded as 0.999), and the inverse propensity of the stochastic logging policy to pick that specific candidate (see the companion paper . for details). Each <user context-candidate product> pair is described using 33 categorical (multi-set) features and 2 numeric features. Of these, 10 features are only-context-dependent while the remaining 25 features depend on both the context and the candidate product. These categorical feature representations have been post-processed to a 74000-dimensional vector with sparse one-hot encoding for each categorical feature. The semantics behind the features will not be disclosed.

These post-processed dataset files are available here .

Your task is to build a function _policy which takes M candidates, each represented by a 74000-dimensional sparse vector, and outputs scores for each of the candidates between 1 and M.

The reward for an individual impression is, did the selected candidate get clicked (reward = 1) or not (reward = 0)? The reward for function f is the aggregate reward over all impressions on a held out test set of impressions.

We will be using an unbiased estimate of the aggregate reward using inverse propensity scoring (see the companion paper for details).

For further details on evaluation please refer to the Getting Started guide in the challenge starter kit.

Rules

  • You may use only the training dataset as outlined in the Dataset guide to develop your submission. In particular, please do not use external data sources or attempt to reverse-engineer the held-out testing set using external public sources.
  • The top 3 teams according to the leaderboard on December 1, 2017 12pm EST must perform the following additional steps to be eligible for prizes:

    • They must submit their model and training scripts to the organizers under an open source license. Their submissions will be checked and run offline to match their online submission to the leaderboard.

    • They must describe their algorithms and their development process during the NIPS’17 Causal Inference and Machine Learning workshop; either in person, or as a remote presentation. Part of the prize money is meant to fund travel and registration at NIPS’17 for the winning teams.

    • Only leaderboard scores above a minimum threshold will be eligible for prizes. The minimum threshold to beat is 55.0

Prizes

  • 1st - $2000
  • 2nd - $1500
  • 3rd - $1000

Additionally:

Resources

Contact:

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :