Extracting structured nutritional data from food packaging images.
In the OpenFood project, labels are being scanned in Swiss supermarkets for all foods available for sale. A database of images of food packages has been prepared, and now nutritional information needs to be extracted from the images into a CSV file.
Nutritional data is presented in tables on food packaging, and nutritional tables contain at a minimum:
- Units per 100g
Optionally additional columns may also be presented:
- Units per serving size
- Percentage of recommended daily intake
- Other optional columns
The nutritional data needs to be extracted into a CSV file which will then be loaded into the OpenFood database.
Nutrients Master List
Nutrients will be one of the following items in the list below, and using the standard unit of measure as indicated. The nutritional information may be in upper, lower or sentence case, and presented in:
- German (de)
- French (fr)
- Italian (it)
- English (en)
Note that information may be presented in multiple languages. The challenge submission should refer to the nutrients via an integer field as indicated in the nutrient_id field in the list below.
|nutrient_id||Nutrient by language||Unit|
|1||de=>Energie en=>energy fr=>énergie it=>energia||kJ|
|2||de=>Energie (kCal) en=>energy (kCal) fr=>énergie (kCal) it=>energia (kCal)||kCal|
|3||de=>Eiweiss en=>protein fr=>protéines it=>proteine||g|
|4||de=>Fett en=>fat fr=>graisses it=>grassi||g|
|5||de=>Kohlenhydrate en=>carbohydrates fr=>glucides it=>carboidrati||g|
|6||de=>Zucker en=>sugars fr=>sucres it=>zuccheri||g|
|7||de=>Salz en=>salt fr=>sel it=>sale||g|
|8||de=>Ballaststoffe en=>fibre fr=>fibres alimentaires it=>fibre||g|
|9||de=>Gesättigte Fettsäuren en=>saturated fat fr=>graisses saturées it=>grassi saturi||g|
|10||de=>Vitamin C en=>Vitamin C fr=>Vitamine C it=>Vitamina C||mg|
|11||de=>Vitamin B2 (Riboflavin) en=>Vitamin B2 (Riboflavin) fr=>Vitamine B2 (Riboflavine) it=>Vitamina B2 (Riboflavin)||mg|
|12||de=>Natrium en=>Sodium fr=>Sodium it=>Sodio||g|
|13||de=>Selen en=>Selenium fr=>Selenium it=>Selenio||µg|
|14||de=>Vitamin E en=>Vitamin E fr=>Vitamine E it=>Vitamina E||mg|
|15||de=>Kalzium en=>Calcium fr=>Calcium it=>calcio||mg|
|16||de=>Magnesium en=>Magnesium fr=>Magnesium it=>Magnesio||mg|
A set of images will be provided, with a corresponding CSV file containing all product information which has been manually extracted. The submitted code will process the images and write the values into a single CSV file called product_nutrients.csv.
Participants will submit both the results and the model. A set of additional images will be processed against the model for final scoring of the submission.
Submissions will be run by crowdAI against a Docker container. Details may be found in the Resources section.
The /project folder will hold the submission scripts, the install and run scripts (see below)
Participants may optionally submit an installation script an
install.sh. When executed in the container this will download and install any necessary code and libraries.
If an external API is used it must be indicated on the submission page. A list of APIs may be found in the Resources section.
Participants must include a script called
run.shwhich is executed against a folder of images in the /project folder. It will produce a single product_nutrients.csv file as output, also in the /project folder.
Participant code must search for each nutrient type and record the value in the CSV file. If a nutrient is not specified on the package, the field is completed with '-1.0'
Listed below are examples of Nutritional data raw images and the extracted raw data (the data should be submitted as a single CSV file for all products, but has been presented as a table for illustration purposes).
This product is an example of a label where a single column of data only is available. In this case the data falls into the per_hundred column, as per the example below.
This product has a serving site (45g) as well as a percentage. These are columns 3 and 4 in the example below. Sometimes there is an additional column of data, which may be ignored for the purposes of this challenge.
- % of correct fields per product averaged across all products.
- A submission must have a grade of 80% or above to be eligible for prizes.
- 500 labeled images will be provided to participants
- Participants must provide runnable code and code is evaluated by crowdAI against a set of secret images
- Runnable code is submitted up to 5 times/day
- The final submission will be graded after the close of the challenge against a set of images that will not be used during the running of the competition.
The author of the most highly ranked submission above 80% will be invited to the crowdAI winner's symposium at EPFL in Switzerland on January 30/31, 2017. This symposium is part of the Applied Machine Learning Days to which the winner will have full access. The educational award is given to the participant with the either the most insightful submission posts, or the best tutorial - the recipient of this award will also be invited to the symposium (the crowdAI team will pick the recipient of this award). Expenses for travel and accommodation are covered by crowdAI.
In addition, there is a CHF 2,000 (~ USD 2,000) prize on the most highly ranked submission above 80%.