Return to HPDaSc meetings

Fourth (Virtual) Workshop of the HPDaSc project, 12 May 2021

Fourth (Virtual) Workshop of the HPDaSc project

12 May 2021

Workshop program

08:45-9:00 (BR) – 13:45-14:00(FR) –  Fabio Porto and Patrick Valduriez:

Opening: workshop overview

09:00-9:30 (BR) – 14:00-14:30(FR) – The Transformer model and relative positional encoding, Antoine Liutkus (Zenith, Inria)

In this presentation, I will explain the core ideas of the Transformer model, which is a new kind of neural networks that is getting increasingly popular in the  previous years and obtains achieved state of the art results. I will show how they can be understood from the perspective of a neural network that automatically detects the receptive field of each neuron, i.e. the actual inputs each neuron attends to. I will then shortly mention my recent research on the topic of positional encoding, which is one of the ingredients for these models.

9:30-10:00 (BR) – 14:30-15:00(FR) –  Workflow Provenance in Scientific Machine Learning, Renan Souza (IBM Research), Marta Mattoso (COPPE) and Patrick Valduriez

Machine Learning (ML) has already fundamentally changed several businesses. More recently, it has also been profoundly impacting the computational science and engineering domains, like geoscience, climate science, and health science. In these domains, users need to perform comprehensive data analyses combining scientific data and ML models to provide for critical requirements, such as reproducibility, model explainability, and experiment data understanding. However, scientific ML is multidisciplinary, heterogeneous, and affected by the physical constraints of the domain, making such analyses even more challenging. In this work, we leverage workflow provenance techniques to build a holistic view to support the lifecycle of scientific ML. We contribute with (i) characterization of the lifecycle and taxonomy for data analyses; (ii) design principles to build this view, with a W3C PROV compliant data representation and a reference system architecture; and (iii) lessons learned after an evaluation in an Oil & Gas case using an HPC cluster with 393 nodes and 946 GPUs. The experiments show that the principles enable queries that integrate domain semantics with ML models while keeping low overhead (<1%), high scalability, and an order of magnitude of query acceleration under workloads without our representation.

10:00-10:30 (BR) – 15:00-15:30 (FR)  Innovation : startup strategies, Patrick Valduriez (Zenith, Inria)

Technological innovation as driven by startups is hard to formalize (and manage) as the context may be unknown or quickly changing. To be successful, the innovation process involves not only inventions (new methods) but also context, e.g. user behavior, and timing, e.g. market readiness. In this talk, I illustrate various innovation strategies based on startup success stories, in particular LeanXcale, which delivers a new generation HTAP DBMS product. I also give hints to promote innovation within startups.

10:30-11:00 (BR)  15:30-16:00 (FR) –  Discussion on ML & Data Management, Fabio Porto and Patrick Valduriez, based on the paper “ML-In-Databases: Assessment and Prognosis” by Kraska et al., IEEE TCDE, 2021.

End of Workshop


Zenith : Esther Pacitti, Patrick Valduriez, Reza Akbarinia, Alexis Joly, Antoine Liutkus, Heraldo Borges; PhD students: Gaetan Heidsieck, Benjamin Deneu, Lamia Djebour, Daniel Rosendo (Kerdata team), Alena Shilova (Cepage team)

LNCC : Fabio Porto, Kary Ocaña, Luiz Gadelha; PhD Students: Anderson Chaves, Claudio Tenorio de Barros, Rafael Pereira

COPPE/UFRJ : Marta Mattoso, Alvaro Coutinho; PhD students: Debora Pina, Liliane Kunstmann Neves, Gabriel Barros, Romulo Silva

UFF: Daniel de Oliveira, Aline Paes,Yuri Frota ; PhD students: Marcello Willians Messina, Carlos Gracioli, Raama Costa

CEFET-RJ : Eduardo Ogasawara, Rafaelli Coutinho; PhD Students: Rebecca Salles, Lais Baroni, Janio Lima,Heraldo Borges; Ms students: Antonio Castro Jr, Janio Lima

Permanent link to this article: