SciDISC (2017-2019) with LNCC, UFRJ, UFF, CEFET (Brazil)

SciDISC (Scientific data analysis using Data-Intensive Scalable Computing) is an associated team (“équipe associée”), between Zenith and 4 teams in the state of Rio de Janeiro (LNCC, COPPE/UFRJ, UFF and CEFET) since january 2017. SciDISC is headed by Marta Mattoso (COPPE/UFRJ) and Patrick Valduriez (Zenith).

Data-intensive science requires the integration of two fairly different paradigms: high-performance computing (HPC) and data-intensive scalable computing (DISC). HPC is compute-centric and focuses on high-performance of simulation applications, typically using powerful, yet expensive supercomputers. DISC, on the other hand, is data-centric and focuses on fault-tolerance and scalability of web and cloud applications using cost-effective clusters of commodity hardware. Examples of DISC systems include big data processing frameworks such as Hadoop or Apache Spark or NoSQL systems . To harness parallel processing, HPC uses a low-level programming model (such as MPI or OpenMP) while DISC relies on powerful data processing operators (Map, Reduce, Filter, …). Data storage is also quite different: supercomputers typically rely on a shared disk infrastructure and data must be loaded in compute nodes before processing while DISC systems rely on a shared-nothing cluster (of disk-based nodes) and data partitioning.

Spurred by the growing need to analyze big scientific data, the convergence between HPC and DISC has been a recent topic of interest. However, simply porting the Hadoop stack on a supercomputer is not cost-effective, and does not solve the scalability and fault-tolerance issues addressed by DISC. On the other hand, DISC systems have not been designed for scientific applications, which have different requirements in terms of data analysis and visualization. This project will address the grand challenge of scientific data analysis using DISC (SciDISC), by developing architectures and methods to combine simulation and data analysis.

Participants

Objectives

Achievements

Publications

Meetings

Permanent link to this article: https://team.inria.fr/zenith/scidisc/

SciDISC achievements

2018 NB: a general presentation of the SciDISC project and results has been given at the LADaS workshop, held in conjunction with the VLDB 2018 conference [Valduriez 2018] In situ and in transit data analysis In situ analysis and visualization have been used successfully in large-scale computational simulations to visualize scientific data of interest, while …

SciDISC meetings

2018 31 Jan 2018: Zenith seminar, Montpellier: Vitor Silva (UFRJ)  “A methodology for capturing and analyzing dataflow paths in computational simulations.” 5 June 2018: Zenith seminar, Montpellier:  Daniel de Oliveira (UFF)  “Parameter and Data Recommendation in Scientific Workflows based on Provenance”. 19 June 2018: Ph.D. defense of Vitor Silva “Analysis of raw data from multiple data sources during …

SciDISC objectives

The research challenge is to develop new architectures and methods to combine simulation and data analysis. We can distinguish between three main approaches depending on where analysis is done [Oldfield 2014]: postprocessing, in-situ and in-transit. Postprocessing analysis performs analysis after simulation, e.g. by loosely coupling a supercomputer and a SciDISC cluster (possibly in the cloud). This …

SciDISC participants

LNCC, Petrópolis, RJ Fabio Porto (senior researcher) Kary Ocaña (researcher) Daniel Gaspar (PhD student) Hermano Lustosa (PhD student) Noel Lemus (postdoc) Rafael Pereira (Master student) João N. Rittmeyer (Master student)   COPPE/UFRJ, Rio de Janeiro, RJ Alvaro Coutinho (professor) Marta Mattoso (professor) José Camata  (posdoc) Vitor Silva (PhD student), until June 2018 Renan Souza (PhD student) …

SciDISC publications

2018 [Bazaz 2018] A Bazaz, H. Borges, E. Ogasawara, STMotif: Discovery of Motifs in Spatial-Time Series, CRAN Repository: https://cran.r-project.org/web/packages/STMotif/index.html, 2018. [Camata 2018] J.  Camata, V. Sousa, P. Valduriez, M. Mattoso, A. Coutinho. In situ visualization and data analysis for turbidity currents simulation. Computers & Geosciences 110: 23-31, 2018. [Campisano 2018] R. Campisano, H. Borges, F. …