Dinizia (Data Science for the Natural Environment)

Dinizia is an associated team (“équipe associée”), between Iroko and 5 teams in the state of Rio de Janeiro (CEFET, Fiocruz, LNCC, UFF, UFRJ) since january 2024. It is headed by Eduardo Ogasawara (CEFET) and Esther Pacitti (Iroko).

The overall objective of Dinizia* is to develop new data science solutions that will eventually contribute to findings in environmental and related sciences. These solutions will be in terms of methods and real systems. Our technical objective within data science is to help managing complex dataflows by organizing massive and heterogeneous data, in connection with models and making related artifacts (datasets, time series, models, metadata, dataflow components, etc.) easy to search, debug, and parallelize. In many ways, dataflows are to scientific data processing what queries are to business data processing, where queries must be written, debugged and optimized and should work across distributed servers. Scientific data processing is much more complex so dataflows replace queries. Thus, a technical goal of this project is to make dataflows work as seamlessly with data as queries do in business processing. The work program includes three major research topics: detecting events in large time series, model life-cycle management, and scalable execution of heterogeneous dataflows.

To validate our solutions, we will capitalize on our previous experience in developing major systems for scientific applications: Pl@ntNet and OpenAlea from Inria; Gypscie and Harbinger from Brazil. With our main application partners (CIRAD and INRAE in France, Fiocruz and Centro de Operações Rio in Brazil), we will validate our results using real datasets and models. We will also contribute to real datasets with data papers. The main applications will be in agronomy (with CIRAD and INRAE), biodiversity informatics (with CIRAD, INRAE, ESALq and Fiocruz), and meteorology (with Centro de Operações Rio).

Keywords: 
A- Research themes on digital science: A1.1. Architectures, A3.3. Data and knowledge analysis, A3.4.4. Optimization and learning, A6.2.6. Optimization, A9.2. Machine learning
B- Other research themes and application areas: B1.1.11. Plant Biology, B3.5. Agronomy, B3.6. Ecology, B4. Energy, B6.5. Information systems

*Dinizia is a perfect example of data-intensive science discovery. This giant tree (full scientific name: Dinizia excelsa), almost 90 meters high and 10 meters round, is the biggest ever identified in the Amazon forest in Northern Brazil. It was discovered by a satellite in 2019 and, following the analysis of the collected data, eventually reached in 2022 by a group of scientists.

Highlights

Participants

Scientific results

Publications

Meetings and seminars