Team presentation
Valda’s focus is on both foundational and systems aspects of complex data management, especially human-centric data. The data we are interested in is typically heterogeneous, massively distributed, rapidly evolving, intensional, and often subjective, possibly erroneous, imprecise, incomplete. In this setting, Valda is in particular concerned with the optimization of complex resources such as computer time and space, communication, monetary, and privacy budgets. The goal is to extract value from data, beyond simple query answering.
Research themes
- Foundations of data management. This axis covers the theory of data management, broadly taken, and in particular the fields of database theory, knowledge
representation, and some symbolic aspects of artificial intelligence (especially, reasoning on data). The goal is to define solid and high-level foundations of data management tasks (query evaluation and optimization of various forms of queries, counting, reasoning, verification of data-centric processes, etc.) through formal tools, such as logics (esp., finite model theory), automata theory, complexity theory; we occasionally have contributions in these areas as well, though most of our work is motivated by data applications. We are especially interested in clean specifications of key aspects of database systems and data management tasks (e.g, confidentiality, access control, robustness), whether they are properties of the data or appropriate (query) languages for these tasks. We study expressive power of languages, computability and complexity of deciding or computing results, as well as the design of appropriate structures (e.g., indexes) to
optimize these tasks. - Uncertainty, provenance and explainability in data management. This research axis deals with the modeling and efficient management of data that come with some uncertainty (probabilistic distributions, logical incompleteness, missing values, inconsistencies, open-world assumption, etc.) and with provenance information (indicating where the data originates from), as well as with the extraction of uncertainty and provenance annotations from real-world data. Provenance is also linked to explainability: determining where the result of a data management task comes from, how and why it was produced, helps explaining it. Interestingly, the foundations and tools for uncertainty management often rely on provenance annotations. For example, a typical way to compute the probability of query results in probabilistic databases is the so-called intensional approach: first generate the provenance of these query results (in some appropriate framework, e.g., that of Boolean functions or of provenance semirings), and then compute the probability of the resulting provenance annotation. For this reason, we deal with uncertainty and provenance in a unified manner, and with explainability as an application thereof.
- Knowledge discovery at scale. Our final axis deals with knowledge discovery at scale. The goal is to use techniques such as data mining, information extraction, data cleaning, information integration, machine learning, to derive knowledge from raw, dirty, inconsistent, heterogeneous, rapidly changing, data from real-world application scenarios. We intend to leverage our expertise on data management to focus on the scalability of the approaches and tools developed. This is also in some sense an application axis for techniques developed in the other two axes; in particular, we have a focus on intensionality of data (i.e., cost to data access), on the trade-off between data uncertainty and its cost, on data provenance and explanations.