HPDaSc (High Performance Data Science)

HPDaSc (High Performance Data Science) is an associated team (“équipe associée”), between Zenith and 4 teams in the state of Rio de Janeiro (LNCC, COPPE/UFRJ, UFF and CEFET) since january 2020. HPDaSc is headed by Patrick Valduriez (Zenith) and Fabio Porto (LNCC).

Data-intensive science requires the integration of two fairly different paradigms: high-performance computing (HPC) and data science. HPC is compute-centric and focuses on high-performance of simulation applications, typically using powerful, yet expensive supercomputers whereas data science is data-centric and focuses on scalability and fault-tolerance of web and cloud applications using cost-effective clusters of commodity hardware.

In the context of the SciDISC project (associated team 2016-2019) and the Inria Project Lab (IPL) HPC-BigData (2018-2022), we studied various architectures for integrating HPC and big data (post-processing, in-situ, in-transit) for applications in astronomy, life science and agronomy, and geoscience (oil & gas). We learned major lessons, which are the basis for this new project:

  • Importance of realtime analytics to make critical high-consequence decisions, e.g. preventing useless drilling based on a driller’s realtime data and realtime visualization of simulated data ;
  • Effectiveness of machine learning (ML) to deal with scientific data, e.g. computing Probability Density Functions (PDFs) over simulated seismic data using Spark;
  • Effectiveness of the Human-In-the-Loop (HIL) paradigm in combination with provenance data in scientific workflows, e.g. to avoid useless, long-duration computations in a supercomputer;
  • Significance of working closely with domain experts in order to interpret scientific data.

This project addresses the grand challenge of High Performance Data Science (HPDaSc), by developing architectures and methods to combine simulation, ML and data analytics.

Highlights

Participants

Objectives

Achievements

Publications

Meetings

Permanent link to this article: https://team.inria.fr/zenith/hpdasc/

Achievements

Data analytics A novel method for detecting events in nonstationary time series [Lima 2022]. The method, entitled Forward and Backward Inertial Anomaly Detector (FBIAD), analyzes inconsistencies in observations concerning surrounding temporal inertia (forward and backward). A comprehensive review of the state-of-the-art  in learning-based analytics for the Edge-to-Cloud Continuum [Rosendo 2022a]. The main simulation, emulation, deployment systems, and testbeds …

Highlights

SBBD 2022 Best Paper award nominee The paper “A Data-Driven Model Selection Approach to Spatio-Temporal Prediction” by Rocío Zorrilla, Eduardo Ogasawara, Patrick Valduriez and Fabio Porto was nominated (top 3) for best paper and ranked second at SBBD 2022 – Brazilian Symposium on Databases,  Buzios, Brazil, 2022. CARLA 2022 Workshop on HPC and Data Sciences meet Scientific …

HPDaSc meetings

8. November 2022: HPDaSc seminar @ Zenith, Inria, Montpellier 15 August 2022: Sixth Workshop of the the HPDaSc project, LNCC, Petropolis, Brazil 25 May 2022: Zenith Seminar “ML Model Management in Gypscie” Fabio Porto, LNCC, Petropolis, Brazil 26 November 2021: Fifth (Virtual) Workshop of the HPDaSc project 12 May 2021:  Fourth (Virtual) Workshop of the HPDaSc …

HPDaSc objectives

Based on lessons learned with previous projects (SciDISC, HPCBD), we address the following requirements for high-performance data science (HPDaSc): Support realtime analytics and visualization (in either in situ or in transit architectures) to help make high-impact online decisions; Combine ML with analytics and simulation, which implies dealing with uncertainty in the data and models, leading …

HPDaSc participants

LNCC, Petrópolis, RJ Fabio Porto (senior researcher), Kary Ocaña (researcher), Luiz Manoel Gadelha (researcher) Rafael Pereira (research engineer), Eduardo Pena (postdoc) PhD students: Anderson Chaves, Gustavo Decarlo, Victor Ribeiro Dornellas MSc students: Rafael de Souza Terra,  Rafael Silva Pereira COPPE/UFRJ, Rio de Janeiro, RJ Alvaro Coutinho (professor), Marta Mattoso (professor), Fernando Rochinha (professor) Renan Souza (research engineer) PhD students: Debora Pina, Liliane …

HPDaSc Publications

2022 [Chaves da Silva 2022] Anderson Chaves da Silva, Patrick Valduriez, Fabio Porto. Integrating Machine Learning Model Ensembles to the SAVIME Database System. SBBD 2022 – Brazilian Symposium on Databases, SBBD, Buzios, Brazil. pp. 231, 2022. [Lima 2022] Janio Lima, Pedro Alpis, Rebecca Salles, Luciana Escobar, Fabio Porto, Esther Pacitti, Rafaelli Coutinho, Eduardo Ogasawara. Forward and Backward …