13:45, Room 455 at PCRI
Computing has been an enormous accelerator to science and it has led to an information explosion in many different fields. The unprecedented volume of data acquired by sensors, derived by simulations and analysis processes, and shared on the Web opens up new opportunities, but it also creates many challenges when it comes to managing and making sense out of these data. In this talk, I discuss the importance of maintaining detailed provenance (also referred to as lineage and pedigree) for digital data. Provenance provides important documentation that is key to preserve data, to determine the data’s quality and authorship, to understand, reproduce, as well as validate results. I will review some of the state-of-the-art techniques, as well as research challenges and open problems involved in managing provenance throughout the data life cycle. I will also discuss benefits of provenance that go beyond reproducibility, and present, in a live demo, techniques and tools we have developed that leverage provenance information to support reflective reasoning and collaborative data exploration and visualization. I conclude with a discussion on new applications that are enabled by provenance. In particular, I will show how provenance can be used to aid in teaching, to create reproducible publications, and as the basis for social data analysis.