HPDaSc (High Performance Data Science)

HPDaSc (High Performance Data Science) is an associated team (“équipe associée”), between Zenith and 4 teams in the state of Rio de Janeiro (LNCC, COPPE/UFRJ, UFF and CEFET) since january 2020. HPDaSc is headed by Patrick Valduriez (Zenith) and Fabio Porto (LNCC).

Data-intensive science requires the integration of two fairly different paradigms: high-performance computing (HPC) and data science. HPC is compute-centric and focuses on high-performance of simulation applications, typically using powerful, yet expensive supercomputers whereas data science is data-centric and focuses on scalability and fault-tolerance of web and cloud applications using cost-effective clusters of commodity hardware.

In the context of the SciDISC project (associated team 2016-2019) and the Inria Project Lab (IPL) HPC-BigData (2018-2022), we studied various architectures for integrating HPC and big data (post-processing, in-situ, in-transit) for applications in astronomy, life science and agronomy, and geoscience (oil & gas). We learned major lessons, which are the basis for this new project:

  • Importance of realtime analytics to make critical high-consequence decisions, e.g. preventing useless drilling based on a driller’s realtime data and realtime visualization of simulated data ;
  • Effectiveness of machine learning (ML) to deal with scientific data, e.g. computing Probability Density Functions (PDFs) over simulated seismic data using Spark;
  • Effectiveness of the Human-In-the-Loop (HIL) paradigm in combination with provenance data in scientific workflows, e.g. to avoid useless, long-duration computations in a supercomputer;
  • Significance of working closely with domain experts in order to interpret scientific data.

This project addresses the grand challenge of High Performance Data Science (HPDaSc), by developing architectures and methods to combine simulation, ML and data analytics.

Highlights

Participants

Objectives

Achievements

Publications

Meetings

Permanent link to this article: https://team.inria.fr/zenith/hpdasc/

Achievements

Data analytics SoftED metrics [Salles 2023a], a new set of metrics designed for soft evaluating event detection methods, which enable the evaluation of both detection accuracy and the degree to which their detections represent events. They improve event detection evaluation by associating events and their representative detections, incorporating temporal tolerance in over 36% of experiments …

Highlights

RISC2 European H2020 project (2021-2023) between Europe and Latin America in HPC, final review The final review of the RISC2 project was on 31 October, 2023 (virtual) and was outstanding. All the reviewers congratulated the RISC2 participants for their excellent work and sustained collaboration, despite the COVID pandemic. As mentioned by the project officer, the …

HPDaSc meetings

19 December 2023: Inria-Brasil Workshop on Digital Sciences and Energy, hybrid mode, organized by Sergio Lifschitz (PUC-Rio), Patrick Valduriez (Inria) and Frederic Valentin (LNCC), PUC-Rio, Rio de Janeiro, Brazil. 25 October 2023: BDA Conference, Montpellier, France: participation of Fabio Porto (LNCC) with the talk “A Data-Driven Model Selection Approach to Spatio-Temporal Prediction”. 25 July 2023: …

HPDaSc objectives

Based on lessons learned with previous projects (SciDISC, HPCBD), we address the following requirements for high-performance data science (HPDaSc): Support realtime analytics and visualization (in either in situ or in transit architectures) to help make high-impact online decisions; Combine ML with analytics and simulation, which implies dealing with uncertainty in the data and models, leading …

HPDaSc participants

LNCC, Petrópolis, RJ Fabio Porto (senior researcher), Kary Ocaña (researcher), Luiz Manoel Gadelha (researcher) Rafael Pereira (research engineer), Eduardo Pena (postdoc) PhD students: Anderson Chaves, Gustavo Decarlo, Victor Ribeiro Dornellas MSc students: Rafael de Souza Terra,  Rafael Silva Pereira COPPE/UFRJ, Rio de Janeiro, RJ Alvaro Coutinho (professor), Marta Mattoso (professor), Fernando Rochinha (professor) Renan Souza (research engineer) PhD students: Debora Pina, Liliane …

HPDaSc Publications

2023 [Akbarinia 2023] Reza Akbarinia, Christophe Botella, Alexis Joly, Florent Masseglia, Marta Mattoso, Eduardo Ogasawara, Daniel de Oliveira, Esther Pacitti, Fabio Porto, Christophe Pradal, Dennis Shasha, Patrick Valduriez. Life Science Workflow Services (LifeSWS): motivations and architecture. Transactions on Large-Scale Data- and Knowledge-Centered Systems, 25 pages, In press, 2023. [Borges 2023] Heraldo Borges, Antonio Castro, Rafaelli …