CAcTUS is an Inria exploratory action led by Alexis Joly (Inria research director) with the participation of Joaquim Estopinan (Inria doctoral student), Pierre Bonnet (botanist at CIRAD), François Munoz (ecologist / modeler at LECA), Maximilien Servajean (researcher in machine learning at LIRMM) and Joseph Salmon (professor of statistical learning at the University of Montpellier).
The acronym CAcTUS comes from “A predictive approach to determining the conservation status of species”. Determining the conservation status of species (“vulnerable”, “threatened”, “extirpated”, etc.) is a complex process that requires both reliable data and a careful and well-informed review of criteria defined by the scientific community (especially those of the IUCN). As this process takes a long time, the list of species analyzed has significant gaps and is not updated quickly enough in view of the speed of current environmental changes. Our hypothesis is that it might be possible to predict the conservation status of species much more efficiently by automatically analyzing the masses of data available through artificial intelligence approaches. Recent cyber infrastructures and new data sources indeed offer possibilities to mobilize and integrate massive amounts of biological data that have never been jointly analyzed. Our approach will be to develop a generic machine learning workflow, applicable to tens of thousands of species jointly. This workflow, in a nutshell, should solve the following three steps: (1) learn species habitats and interaction patterns from existing occurrences paired with bioclimatic and environmental covariates (jointly for all species), (2) predict present and future populations according to available projections (e.g. climate change projections and land-use harmonization), (3) train a conservation status classifier based on the trained patterns and predicted populations.
To learn these complex relationships and deal with data heterogeneity, we will centrally rely on deep learning models and associated methodology such as transfer learning, domain adaptation, multi-task learning, meta-learning or adversarial regularization. However, difficult problems arise, in particular the lack of absence data, observation biases and the very strong imbalance of data available for the different species.