Return to HPDaSc meetings and seminars

Fifth (Virtual) Workshop of the HPDaSc project, 26 November 2021

Fifth (Virtual) Workshop of the HPDaSc project

26 november 2021

Workshop program

10:00-12:30 (BR) – 14:00-16:30(FR)

Opening: workshop overview: Fabio Porto and Patrick Valduriez

Short Presentations (30 min)

Benjamin Deneu (Inria), Adviser: Alexis Joly: Interpretability of distribution models of plant species communities learned through deep learning

Abstract: Convolutional Neural Networks (CNNs) are statistical models suited for learning complex visual patterns. In the context of Species Distribution Models (SDM) and in line with predictions of landscape ecology, CNN could grasp how local landscape structure affects prediction of species occurrence in SDMs. The prediction can thus reflect the signatures of entangled ecological processes. Although previous machine-learning based SDMs can learn complex influences of environmental predictors, they cannot acknowledge the influence of environmental structure in local landscapes. In addition, the ability of CNNs models to use image data allows for the use of high-resolution remote sensing data that is easily accessible over large areas (e.g. satellite images). In my thesis work we investigate the performance but also the learning interpretations of these distribution models based on CNNs. The results highlight several benefits including better performance, the ability to share the learning of many species simultaneously allowing for better prediction of rare species, the capture of the spatial structure of the environment and the ability to capture information at different scales (landscapes vs. ecoregions)

Rocio Milagros Zorrilla Coz (LNCC): A Data-Driven Model Selection Approach to Spatio-Temporal Prediction

Abstract: Spatio-temporal Predictive Queries encompass a spatio-temporal constraint, defining a region, a target variable, and an evaluation metric. The output of such queries presents the future values for the target variable computed by predictive models at each point of the spatio-temporal region. In this work, we propose a data-driven approach for selecting pre-trained temporal models to be applied at each query point. The technique applies a model to a point according to the training and input time series similarity. The approach avoids training a different model for each domain point, saving model training time. Moreover, it provides a technique to decide on the best-trained model to be applied to a point for prediction, which any available models may not see.

Short Presentations (10 min)

Antonio Castro Jr (CEFET-RJ), Adviser: Eduardo Ogasawara: Generalized Discovery of Tight Space-Time Sequences

Abstract: Spatio-temporal patterns bring knowledge about sequences of events, place and time in which they occur. Finding such patterns is a complex task and of great value for different domains. However, not all patterns are frequent across an entire dataset, being often constrained in space and time. This paper proposes the G-STSM algorithm to discover frequent sequences constrained in space and time, without using prior restrictions. This allows different sequence sizes, time ranges and space to be found, being the first to allow the use of space in three dimensions. G-STSM was tested using two real-world spatiotemporal datasets from health and seismic domain, showing its quality, performance and generality.

Rodrigo Prado (UFF), Advisers: Esther Pacitti, Yuri Frota, Daniel de Oliveira: CYCLOPS: A Scheduling Heuristic for Big Data Workflows in Clouds with Confidentiality Restrictions

Abstract: Clouds provide an on-demand environment that allows scientists to migrate their local experiments to an elastic environment. Experiments are modelled as scientific workflows, and many of them are computing and data-intensive. The storage of these data is a concern, as confidentiality can be compromised. Malicious users may infer knowledge of the results and structure of workflows. Data dispersion and encryption can be adopted to increase confidentiality, but these mechanisms cannot be adopted uncoupled from workflow scheduling, at the risk of increasing execution time and financial costs. In this paper, we present CYCLOPS, a scheduling heuristic that considers data confidentiality constraints.

Anderson Chaves da Silva (LNCC),  Adviser: Fabio Porto: Fermata: A Platform for Monitoring and Analysis of Extreme Events

Abstract: Extreme events occur in nature and society and have the potential to cause massive impact on the environments in which they take place. For this reason, the monitoring of these environments and the correct understanding of the circumstances in which these events arise is essential in order to be able to predict, detect and react rapidly to them. However, the surveillance of extreme events is not an easy task, demanding a robust heterogeneous hardware infrastructure and an efficient software system to support massive amounts of both historical as well as real-time data. We propose Fermatta, a data management platform capable of managing high volumes of both human and sensor data generated concurrently by heterogeneous sources, by using asynchronous programming primitives, non-relational databases techniques and supporting machine learning based analysis.

End of Workshop

Participants:

Zenith : Esther Pacitti, Patrick Valduriez, Reza Akbarinia, Alexis Joly, Antoine Liutkus; PhD students: , Benjamin Deneu, Lamia Djebour, Daniel Rosendo (Kerdata team), Alena Shilova (Cepage team)

LNCC: Fabio Porto, Kary Ocaña; PhD Students: Anderson Chaves, Claudio de Barros; PhD students: Klaus Whemuth, Rocio Zorrilla, Raphael Saldanha, MSc student: Rafael S. Pereira

CEFET-RJ: Eduardo Ogasawara; PhD students: Rebecca Salles; MSc Student: Antonio Castro

COPPE-UFRJ: Marta Mattoso, Alvaro Coutinho; PhD students: Debora Pina; Liliane Neves

UFF: Daniel de Oliveira; PhD Student: Rodrigo Prado

 

Permanent link to this article: https://team.inria.fr/zenith/hpdasc/meetings/fifth-meeting/