Zenith seminar: 15/01/19, 15h – BAT5-02.124
Providing Online Data Analytical Support for Humans in the Loop of Computational Science and Engineering Applications
Renan Souza (IBM Research Brazil and UFRJ, Rio de Janeiro)
Abstract.Computational Scientists and Engineers analyze complex and big data during the execution of long-lasting data processing workflows in parallel machines. Depending on the results, they may need to steer the workflows by adapting predefined input data or settings. Being able to analyze the resulting data online knowing that certain results may have been directly influenced by specific actions they took is of paramount importance for result interpretability, reuse, and reproducibility. However, three major challenges hinder such analysis: online analytical support, user steering tracking, and efficient performance. In this talk, I will focus on online analytical support particularly for problems that require integrated data analysis by multi-workflows. Multi-workflows are distributed and parallel workflows that process data in heterogeneous data stores (e.g., DBMSs with various data models or raw data files) and share data dependencies. Such heterogeneity makes online analytical support even more challenging. We propose a solution to capture workflow provenance and domain data online to provide an integrated view over the data stores. We explore a real case study composed of four workflows that preprocess data for a Deep Learning classifier for Oil and Gas exploration. We show that our solution allows users to run online integrated data analysis of the multi-workflow data. Also, for certain scenarios, the performance of our solution is two orders of magnitude faster than a state-of-the-art solution.