Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

ImageCLEF 2018 Caption - Concept Detection

Identifying relevant concepts in a large corpus of medical images



Important note:

The ImageCLEF Caption - Concept Detection challenge has officially ended and we would like to thank everybody for their participation. You can find the official results at http://imageclef.org/2018/caption.

Post-challenge submissions and the leaderboard will remain enabled for a few weeks so you will still be able to submit result files and have them continuously evaluated during a limited period. Please consider that in order to see the version of the leaderboard with the post-challenge submissions integrated, you have to turn on the switch Show post-challenge submission right below the leaderboard.

At the same time we’d like to encourage you to submit a CLEF Working notes paper until the end of May.

Please also note that participants registering from now on will not be automatically registered with CLEF anymore.

Note: ImageCLEF Caption 2018 is divided into 2 subtasks (challenges). This challenge is about Concept Detection. For information on the Caption Prediction challenge click here . Both challenges share the same dataset, so registering for one of these challenges will automatically give you access to the other one.

Note: Do not forget to read the Rules section on this page


Interpreting and summarizing the insights gained from medical images such as radiology output is a time-consuming task that involves highly trained experts and often represents a bottleneck in clinical diagnosis pipelines.

Consequently, there is a considerable need for automatic methods that can approximate this mapping from visual information to condensed textual descriptions. In this task, we cast the problem of image understanding as a cross-modality matching scenario in which visual content and textual descriptors need to be aligned and concise textual interpretations of medical images are generated. We work on the basis of a large-scale collection of figures from open access biomedical journal articles (PubMed Central). Each image is accompanied by its original caption, constituting a natural testbed for this image captioning task.

Lessons learned: In the first edition of this task, held at CLEF 2017, participants noted a broad topical variability among training images. This year, we will further group training data into image types (e.g., radiology vs. biopsy) and task participants will building either cross category models or category-specific ones. An additional source of uncertainty was noted in the use of external material. In this second edition of the task, we will clearly separate systems using exclusively the official training data from those that incorporate additional sources of evidence.

Challenge description

As a first step to automatic image captioning and scene understanding, participating systems are tasked with identifying the presence and location of relevant concepts in a large corpus of medical images. Based on the visual image content, this subtask provides the building blocks for the scene understanding step by identifying the individual components from which captions are composed. Evaluation is conducted in terms of set coverage metrics such as precision, recall and combinations thereof. This task will be run on a subset of the data as manual ground truthing is required.


The collection comprises a total of 4 million image-caption pairs that could potentially all be used for training with a small subset being removed for testing. To focus on useful radiology/clinical images and non-compound figures is likely good for this task to reduce the number of image-caption pairs to around 400,000, so significantly larger that in 2017.

Submission instructions

As soon as the submission is open, you will find a “Create Submission” button on this page (just next to the tabs)

For the submission we expect the following format:

[Figure-ID] [TAB] [Concept-ID-1];[Concept-ID-2];[Concept-ID-n]


1743-422X-4-12-1-4 C1;C6;C100
1743-422X-4-12-1-3 C89;C374
1743-422X-4-12-1-2 C8374

You need to respect the following constraints:

  • The separator between the figure ID and the concepts has to be a tabular whitespace
  • The separator between the UMLS concepts has to be a semicolon (;)
  • The same concept cannot be specified more than once for a given figure ID
  • Each figure ID of the testset must be included in the submission file exactly once (even if there are no concepts)


PubMed Central


Evaluation is conducted in terms of F1 scores between system predicted and ground truth concepts, using the following methodology and parameters:

  • The default implementation of the Python scikit-learn (v0.17.1-2) F1 scoring method is used. It is documented here.

  • A Python (3.x) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT concept sets

  • For each candidate-GT concept set, the y_pred and y_true arrays are generated. They are binary arrays indicating for each concept contained in both candidate and GT set if it is present (1) or not (0).

  • The F1 score is then calculated. The default ‘binary’ averaging method is used.

  • All F1 scores are summed and averaged over the number of elements in the test set (10’000), giving the final score.

The ground truth for the test set was generated based on the UMLS Full Release 2016AB .

NOTE : The source code of the evaluation tool is available here. It must be executed using Python 3.x, on a system where the scikit-learn (>= v0.17.1-2) Python library is installed. The script should be run like this:

/path/to/python3 evaluate-f1.py /path/to/candidate/file /path/to/ground-truth/file

The leaderboard will be visible from 01.05.2018 (official deadline) on. The submission system will remain open few more days. Results submitted after deadline will not be part of the official results.


Note: In order to participate in this challenge you have to sign an End User Agreement (EUA). You will find more information on the ‘Dataset’ tab. ImageCLEF lab is part of the Conference and Labs of the Evaluation Forum: CLEF 2018. CLEF 2018 consists of independent peer-reviewed workshops on a broad range of challenges in the fields of multilingual and multimodal information access evaluation, and a set of benchmarking activities carried in various labs designed to test different aspects of mono and cross-language Information retrieval systems. More details about the conference can be found here .

Submitting a working note with the full description of the methods used in each run is mandatory. Any run that could not be reproduced thanks to its description in the working notes might be removed from the official publication of the results. Working notes are published within CEUR-WS proceedings, resulting in an assignment of an individual DOI (URN) and an indexing by many bibliography systems including DBLP. According to the CEUR-WS policies, a light review of the working notes will be conducted by ImageCLEF organizing committee to ensure quality. As an illustration, ImageCLEF 2017 working notes (task overviews and participant working notes) can be found within CLEF 2017 CEUR-WS proceedings.


Participants of this challenge will automatically be registered at CLEF 2018. In order to be compliant with the CLEF registration requirements, please edit your profile by providing the following additional information:

  • First name

  • Last name

  • Affiliation

  • Address

  • City

  • Country

  • Regarding the username, please choose a name that represents your team.

This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs

Participating as an individual (non affiliated) researcher

We welcome individual researchers, i.e. not affiliated to any institution, to participate. We kindly ask you to provide us with a motivation letter containing the following information:

  • the presentation of your most relevant research activities related to the task/tasks

  • your motivation for participating in the task/tasks and how you want to exploit the results

  • a list of the most relevant 5 publications (if applicable)

  • the link to your personal webpage

The motivation letter should be directly concatenated to the End User Agreement document or sent as a PDF file to bionescu at imag dot pub dot ro. The request will be analyzed by the ImageCLEF organizing committee. We reserve the right to refuse any applicants whose experience in the field is too narrow, and would therefore most likely prevent them from being able to finish the task/tasks.


ImageCLEF 2018 is an evaluation campaign that is being organized as part of the CLEF initiative labs. The campaign offers several research tasks that welcome participation from teams around the world. The results of the campaign appear in the working notes proceedings, published by CEUR Workshop Proceedings (CEUR-WS.org). Selected contributions among the participants, will be invited for publication in the following year in the Springer Lecture Notes in Computer Science (LNCS) together with the annual lab overviews.


Contact us

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :

  • Sharada Prasanna Mohanty: sharada.mohanty@epfl.ch
  • Alba Garcia Seco de Herrera: alba[DOT]garcia[AT]essex[DOT]ac[DOT]uk
  • Henning Müller: henning[DOT]mueller[AT]hevs[DOT]ch
  • Vincent Adrearczyk: vincent[DOT]andrearczyk[AT]hevs[DOT]ch
  • Ivan Eggel: ivan[DOT]eggel[AT]hevs[DOT]ch

More information

You can find additional information on the challenge here: http://imageclef.org/2018/caption