Location-based species prediction
43 days left
Note: Do not forget to read the Rules section on this page
Automatically predicting the list of species that are the most likely to be observed at a given location is useful for many scenarios in biodiversity informatics. First of all, it could improve species identification processes and tools by reducing the list of candidate species that are observable at a given location (be they automated, semi-automated or based on classical field guides or flora). More generally, it could facilitate biodiversity inventories through the development of location-based recommendation services (typically on mobile phones) as well as the involvement of non-expert nature observers. Last but not least, it might serve educational purposes thanks to biodiversity discovery applications providing functionalities such as contextualized educational pathways.
The aim of the challenge is to predict the list of species that are the most likely to be observed at a given location. Therefore, we will provide a large training set of species occurrences, each occurrence being associated to a multi-channel image characterizing the local environment. Indeed, it is usually not possible to learn a species distribution model directly from spatial positions because of the limited number of occurrences and the sampling bias. What is usually done in ecology is to predict the distribution on the basis of a representation in the environmental space, typically a feature vector composed of climatic variables (average temperature at that location, precipitation, etc.) and other variables such as soil type, land cover, distance to water, etc. The originality of GeoLifeCLEF is to generalize such niche modeling approach to the use of an image-based environmental representation space. Instead of learning a model from environmental feature vectors, the goal of the task will be to learn a model from k-dimensional image patches, each patch representing the value of an environmental variable in the neighborhood of the occurrence (see figure below for an illustration). From a machine learning point of view, the challenge will thus be treatable as an image classification task. Participants will learn their models on a train set constituted of valid and uncertain citizen sciences occurrences. They will then try to predict the most likely species on an independent test set made up of expert plant occurrences with accurate identification and spatial location over very diverse biotic areas in France: The Mediterranean and Alpine regions.
Train data downloadable on the “Dataset” tab.
Check out the Python scripts to simplify formatting of the dataset for the learning process (provided here https://github.com/maximiliense/GLC19).
This year, the train dataset is augmented compared to the 2018 edition. In a nutshell, it will first include the train and test georeferenced occurrences of plant species from last year (file GLC_2018.csv). Plus, a large amount of plant species occurrences with uncertain identifications are added (file PL_complete.csv). They come from automatic species identification of pictures produced in 2017-2018 by the smartphone application Pl@ntNet, where users are mainly amators botanists. A trusted extraction of this dataset is also provided (file PL_trusted.csv), insuring a reasonable level of identification certainty. Finally, species occurrences from other kingdoms (as mammals, birds, amphibias, insects, fungis etc.) were selected from the GBIF database (file noPlant.csv). 33 environmental rasters (directory rasters GLC19/) covering the French territory are made available this year, so that each occurrence may be linked to an environmental tensor via a participant customizable Python code. These environmental rasters were constructed from various open datasets including Chelsea Climate , ESDB soil pedology data [2,3,4], Corine Land Cover 2012 soil occupation data, CGIAR-CSI evapotranspiration data [5,6], USGS Elevation data (Data available from the U.S. Geological Survey.) and BD Carthage hydrologic data. The test occurrences data come from independents datasets of the French National Botanical Conservatories. A detailed description of the protocol used to build the datasets will soon be available for download on this page.
As soon as the submission is open, you will find a “Create Submission” button on this page (just next to the tabs)
Further instructions on the submission format will be published soon.
Information will be posted after the challenge ends.
The main evaluation criteria will be the accuracy based on the 30 first answers, also called Top30. It is the mean of the function scoring 1 when the good species is in the 30 first answers, and 0 otherwise, over all test set occurrences.
LifeCLEF lab is part of the Conference and Labs of the Evaluation Forum: CLEF 2019. CLEF 2019 consists of independent peer-reviewed workshops on a broad range of challenges in the fields of multilingual and multimodal information access evaluation, and a set of benchmarking activities carried in various labs designed to test different aspects of mono and cross-language Information retrieval systems. More details about the conference can be found here .
Submitting a working note with the full description of the methods used in each run is mandatory. Any run that could not be reproduced thanks to its description in the working notes might be removed from the official publication of the results. Working notes are published within CEUR-WS proceedings, resulting in an assignment of an individual DOI (URN) and an indexing by many bibliography systems including DBLP. According to the CEUR-WS policies, a light review of the working notes will be conducted by LifeCLEF organizing committee to ensure quality. As an illustration, LifeCLEF 2018 working notes (task overviews and participant working notes) can be found within CLEF 2018 CEUR-WS proceedings.
Participants of this challenge will automatically be registered at CLEF 2019. In order to be compliant with the CLEF registration requirements, please edit your profile by providing the following additional information:
Regarding the username, please choose a name that represents your team.
This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs
LifeCLEF 2019 is an evaluation campaign that is being organized as part of the CLEF initiative labs. The campaign offers several research tasks that welcome participation from teams around the world. The results of the campaign appear in the working notes proceedings, published by CEUR Workshop Proceedings (CEUR-WS.org). Selected contributions among the participants, will be invited for publication in the following year in the Springer Lecture Notes in Computer Science (LNCS) together with the annual lab overviews.
- Discussion Forum : https://www.crowdai.org/challenges/lifeclef-2019-geo/topics
We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :
Maximilien SERVAJEAN, Maximilien.Servajean@lirmm.fr
Christophe BOTELLA, email@example.com
Alexis JOLY, firstname.lastname@example.org
You can find additional information on the challenge here: https://www.imageclef.org/GeoLifeCLEF2019