Data analysis: do not start without provenance data
COPPE/UFRJ, Rio de Janeiro
Abstract: This talk will present the history and current status of provenance with its role in scientific data analysis. Provenance aims at registering the dataflow resulting from computer simulations, which is essential to make a reproducible and reliable experiment. Scientific data analysis can be improved when provenance data acts as an index that relates data repositories and represents domain data giving access to its content elements. More specifically, the challenges in providing provenance data to support the scientist in the design and reconfiguration of the workflow, while it is executing in a high-performance computing environment, are discussed. They will be presented with real use cases of applications in bioinformatics with parallel execution in clouds and applications in geophysics using supercomputers.