Presentation

Data-intensive science such as agronomy, astronomy, biology and environmental science must deal with overwhelming amounts of experimental data produced through empirical observation and simulation. Such data must be processed (cleaned, transformed, analyzed) in all kinds of ways in order to draw new conclusions, prove scientific theories and produce knowledge. However, constant progress in scientific observational instruments (e.g. satellites, sensors, loT) and simulation tools (that foster in silico experimentation) creates a huge data overload.

Scientific data is very complex, in particular because of heterogeneous methods used for producing data, the uncertainty of captured data, the inherently multi-scale nature (spatial scale, temporal scale) of many sciences and the growing use of imaging (e.g. molecular imaging), resulting in data with hundreds of attributes, dimensions or descriptors. Despite their variety, we can identify common features of scientific data: big data; manipulated through complex, distributed workflows; typically complex, e.g. multidimensional or graph-based; with uncertainty in the data values, e.g., to reflect data capture or observation; important metadata about experiments and their provenance; and mostly append-only (with rare updates).

The three main challenges of scientific data management can be summarized by: (1) scale (big data, big applications); (2) complexity (uncertain, multi-scale data with lots of dimensions), (3) heterogeneity (in particular, data semantics heterogeneity). They are also those of data science, with the goal of making sense of data by combining data management, machine learning, statistics and other disciplines. The overall goal of Zenith is to address these challenges, by proposing innovative solutions with significant advantages in terms of scalability, functionality, ease of use, and performance. To produce generic results, these solutions are in terms of architectures, models and algorithms that can be implemented in terms of components or services in specific computing environments, e.g. grid, cloud. We design and validate our solutions by working closely with our scientific application partners such as INRAe and CIRAD in France, or FIOCRUZ in Brazil. To further validate our solutions and extend the scope of our results, we also foster industrial collaborations, even in non scientific applications, provided that they exhibit similar challenges.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Spotlight on PlantNet

Seminar by Patrick Valduriez on “Big Data Technologies”, Inria Paris, 23 November 2023

Talk on “Life Science Workflow Services (LifeSWS): motivations and architecture” by Patrick Valduriez, The Data Systems Seminar Series, University of Waterloo, 5 sept 2023

Workshop “Émeritat de Patrick Valduriez”, 5 June 2023

Talk on “Data Science and Innovation” by Patrick Valduriez, COPPE/UFRJ, Rio de Janeiro, 5 May 2023

Seminar by Patrick Valduriez “Data Science and Innovation”, CEFET, Rio de Janeiro, 3 May 2023.

Inria-Brasil Workshop, 10-14 April 2023

Interview d’Alexis Joly sur France Inter – 12 avril 2023

Presentation

Search

Events

Blogroll