Loading
Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

crowdAI is shutting down - please read our blog post for more information

OpenFood Nutrition Table Challenge

Extracting structured nutritional data from food packaging images.


Completed
4
Submissions
46
Participants
5638
Views

Overview

In the OpenFood project, labels are being scanned in Swiss supermarkets for all foods available for sale. A database of images of food packages has been prepared, and now nutritional information needs to be extracted from the images into a CSV file.

Nutritional data is presented in tables on food packaging, and nutritional tables contain at a minimum:

  • Nutrient
  • Units per 100g

Optionally additional columns may also be presented:

  • Units per serving size
  • Percentage of recommended daily intake
  • Other optional columns

The nutritional data needs to be extracted into a CSV file which will then be loaded into the OpenFood database.

Nutrients Master List

Nutrients will be one of the following items in the list below, and using the standard unit of measure as indicated. The nutritional information may be in upper, lower or sentence case, and presented in:

  • German (de)
  • French (fr)
  • Italian (it)
  • English (en)

Note that information may be presented in multiple languages. The challenge submission should refer to the nutrients via an integer field as indicated in the nutrient_id field in the list below.

nutrient_id Nutrient by language Unit
1 de=>Energie en=>energy fr=>énergie it=>energia kJ
2 de=>Energie (kCal) en=>energy (kCal) fr=>énergie (kCal) it=>energia (kCal) kCal
3 de=>Eiweiss en=>protein fr=>protéines it=>proteine g
4 de=>Fett en=>fat fr=>graisses it=>grassi g
5 de=>Kohlenhydrate en=>carbohydrates fr=>glucides it=>carboidrati g
6 de=>Zucker en=>sugars fr=>sucres it=>zuccheri g
7 de=>Salz en=>salt fr=>sel it=>sale g
8 de=>Ballaststoffe en=>fibre fr=>fibres alimentaires it=>fibre g
9 de=>Gesättigte Fettsäuren en=>saturated fat fr=>graisses saturées it=>grassi saturi g
10 de=>Vitamin C en=>Vitamin C fr=>Vitamine C it=>Vitamina C mg
11 de=>Vitamin B2 (Riboflavin) en=>Vitamin B2 (Riboflavin) fr=>Vitamine B2 (Riboflavine) it=>Vitamina B2 (Riboflavin) mg
12 de=>Natrium en=>Sodium fr=>Sodium it=>Sodio g
13 de=>Selen en=>Selenium fr=>Selenium it=>Selenio µg
14 de=>Vitamin E en=>Vitamin E fr=>Vitamine E it=>Vitamina E mg
15 de=>Kalzium en=>Calcium fr=>Calcium it=>calcio mg
16 de=>Magnesium en=>Magnesium fr=>Magnesium it=>Magnesio mg

Processing

A set of images will be provided, with a corresponding CSV file containing all product information which has been manually extracted. The submitted code will process the images and write the values into a single CSV file called product_nutrients.csv.

Participants will submit both the results and the model. A set of additional images will be processed against the model for final scoring of the submission.

Submission

  • Submissions will be run by crowdAI against a Docker container. Details may be found in the Resources section.

  • The /project folder will hold the submission scripts, the install and run scripts (see below)

  • Participants may optionally submit an installation script an install.sh. When executed in the container this will download and install any necessary code and libraries.

  • If an external API is used it must be indicated on the submission page. A list of APIs may be found in the Resources section.

  • Participants must include a script called run.sh which is executed against a folder of images in the /project folder. It will produce a single product_nutrients.csv file as output, also in the /project folder.

  • Participant code must search for each nutrient type and record the value in the CSV file. If a nutrient is not specified on the package, the field is completed with '-1.0'

Examples

Listed below are examples of Nutritional data raw images and the extracted raw data (the data should be submitted as a single CSV file for all products, but has been presented as a table for illustration purposes).

Product 7207

This product is an example of a label where a single column of data only is available. In this case the data falls into the per_hundred column, as per the example below.

product_id nutrition_id per_hundred per_portion percent
7207 1 1180.0 -1.0 -1.0
7207 2 282.0 -1.0 -1.0
7207 3 14.0 -1.0 -1.0
7207 4 25.0 -1.0 -1.0
7207 5 1.0 -1.0 -1.0
7207 7 1.7 -1.0 -1.0
7207 6 -1.0 -1.0 -1.0
7207 8 -1.0 -1.0 -1.0
7207 9 -1.0 -1.0 -1.0
7207 10 -1.0 -1.0 -1.0
7207 11 -1.0 -1.0 -1.0
7207 12 -1.0 -1.0 -1.0
7207 13 -1.0 -1.0 -1.0
7207 14 -1.0 -1.0 -1.0
7207 15 -1.0 -1.0 -1.0
7207 16 -1.0 -1.0 -1.0

Product 7276

This product has a serving site (45g) as well as a percentage. These are columns 3 and 4 in the example below. Sometimes there is an additional column of data, which may be ignored for the purposes of this challenge.

product_id nutrition_id per_hundred per_portion percent
7276 1 1720.0 774.0 9.0
7276 2 410.0 185.0 9.0
7276 3 7.2 3.24 6.0
7276 4 19.8 8.91 12.0
7276 5 49.2 22.14 8.0
7276 6 14.7 6.62 7.0
7276 7 1.05 0.47 7.0
7276 9 10.8 4.86 24.0
7276 8 -1.0 -1.0 -1.0
7276 10 -1.0 -1.0 -1.0
7276 11 -1.0 -1.0 -1.0
7276 12 -1.0 -1.0 -1.0
7276 13 -1.0 -1.0 -1.0
7276 14 -1.0 -1.0 -1.0
7276 15 -1.0 -1.0 -1.0
7276 16 -1.0 -1.0 -1.0

Evaluation

  • % of correct fields per product averaged across all products.
  • A submission must have a grade of 80% or above to be eligible for prizes.

Rules

  • 500 labeled images will be provided to participants
  • Participants must provide runnable code and code is evaluated by crowdAI against a set of secret images
  • Runnable code is submitted up to 5 times/day
  • The final submission will be graded after the close of the challenge against a set of images that will not be used during the running of the competition.

Prizes

The author of the most highly ranked submission above 80% will be invited to the crowdAI winner's symposium at EPFL in Switzerland on January 30/31, 2017. This symposium is part of the Applied Machine Learning Days to which the winner will have full access. The educational award is given to the participant with the either the most insightful submission posts, or the best tutorial - the recipient of this award will also be invited to the symposium (the crowdAI team will pick the recipient of this award). Expenses for travel and accommodation are covered by crowdAI.

In addition, there is a CHF 2,000 (~ USD 2,000) prize on the most highly ranked submission above 80%.

Resources

### Docker Container The [Jupyter Notebook Scientific Python Stack](https://hub.docker.com/r/jupyter/scipy-notebook) Docker container will be the development and evaluation environment for this challenge. The challenge [dataset](https://www.crowdai.org/challenges/3/dataset_files) is comprised of source images **source_images.tar** and the pre-populated answer file **product_nutrients.csv**. #### source_images.tar The images archive will contain 500 images with the product id embedded in the name, eg: . image-2324.jpg image-2325.jpg image-2326.jpg image-2327.jpg . ### Installation of software Participants may use any FOSS (free and open source) resources to produce the solution, and if used must be installed using a bash shell script called **install.sh** For example ```sh #!/bin/sh apt-get update -y apt-get install curl -y ``` ### External APIs The following APIs may be used in this project. If they are used, the solution must work with the free tier. [Google Vision](https://cloud.google.com/vision) [IBM Watson](http://www.ibm.com/watson/developercloud/visual-recognition.html) [Microsoft Vision API](https://www.microsoft.com/cognitive-services/en-us/computer-vision-api) [clarifai](Clarifai) If you wish to use another API please [contact us.](https://www.crowdai.org/pages/contact)