Valda Seminar: Yann Ramusat

12 April 2022, 10:30-11:30.

ENS, S16

The Semiring-Based Provenance Framework for Graph Databases

The growing amount of data collected by sensors or generated by human interaction has led to an increasing use of graph databases, an efficient model for representing intricate data.
Techniques to keep track of the history of computations applied to the data inside classical relational database systems are also topical because of their application to enforce Data Protection Regulations (e.g., GDPR).
Our research work mixes the two by considering a semiring-based provenance model for navigational queries over graph databases.
In the first part, we will focus on the model itself by introducing a toolkit of provenance-aware algorithms, each targeting specific properties of the semiring of use.
We notably introduce a new method based on lattice theory permitting an efficient provenance computation for complex graph queries.
We propose an open-source implementation of the above-mentioned algorithms, and we conduct an experimental study over real transportation networks of large size, witnessing the practical efficiency of our approach in practical scenarios.
From the richness of the literature, we notably obtain a lower bound for the complexity of the full provenance computation in our setting.
We finally consider how this framework is positioned compared to other provenance models such as the semiring-based Datalog provenance model.
We make explicit how the methods we applied for graph databases can be extended to Datalog queries, and we show how they can be seen as an extension of the semi-naïve evaluation strategy.
To leverage this fact, we extend the capabilities of \textsc{Soufflé}, a state-of-the-art Datalog solver, to design an efficient provenance-aware Datalog evaluator.
Experimental results based on our open-source implementation entail the fact this approach stays competitive with dedicated graph solutions, despite being more general.
In a final round, we discuss on some research ideas for improving the model, and state open questions raised by our work.

Comments are closed.