IBC seminar: Marta Mattoso, “Big Data Workflows – how provenance can help”, March 25, 2pm.

Séminaire IBC

Lundi 25 mars, 14h

Salle 127, Batiment la Galera

Organisé par l’équipe Zenith

Big Data Workflows – how provenance can help
Marta Mattoso
UFRJ, Rio de Janeiro

Big data analyses are critical for decision support in business data processing. These analyses involve the execution of many activities such as: programs to explore data from the web, databases, data warehouses and files; data cleaning procedures; programs to aggregate data; core programs that perform analyses; and tools to visualize and interpret the results. Each step (activity) of the analysis is performed isolated from the other and the analysts need to manually manage the larger life cycle of big data analysis. Big data analysis started to be represented as pipelines or dataflows. However, current approaches lack features to provide a consistent view of many different explorations and activities as part of a broader analysis, like a computational experiment. Scientific workflows have long provided such features for scientific experiments, and although originally designed for science, they may be useful to support the life cycle of big data analysis. Scientific analyses typically involve
experimenting with several steps using different datasets and computer programs. Scientists need to manage the composition, execution and analysis of their experiments carefully, so the results can be trusted and the experiments reproducible. To help managing experiments, scientific workflow management systems (SWfMS) have been proposed to let scientists design workflows of different complexities and manage their execution, including high performance computing (HPC) in cloud environments. Most SWfMS also have provenance data support. Provenance tracks how the results of the experiments were produced, which is essential to make an experiment (big data analysis) reproducible and trustworthy. Business Process Workflows are focused on modeling the process rather than managing big data flows with provenance and HPC. In this talk we discuss on provenance support along the big data analysis workflow as an alternative to improve results of big data
analysis, especially in a long-term view

Permanent link to this article: https://team.inria.fr/zenith/ibc-seminar-marta-mattoso-big-data-workflows-how-provenance-can-help-march-25-2pm/