Our research is organized along two main themes:
1. Scalable tools for parallel data processing
Our research seeks to exploit parallel data processing infrastructures in order to devise highly scalable data storage and processing tools. Currently ongoing work seeks to:
- automatically exploit a variety of data management architectures in order to provide efficient query processing using highly heterogeneous hardware, centralized and/or distributed
- devise highly efficient algorithms for answering queries on data in the presence of semantics, expressed through an ontology
- develop novel tools for scalable, fast data analytics, in highly distributed architectures, applied in particular to genomics.
2. New user-data interaction paradigms
To enhance the usefulness of Big Data, we will work to devise new paradigms of interaction with the data:
- Relying on machine learning to leverage user feedback, we will investigate new paradigms for data exploration through interactive querying, and revise the classical database server and optimization framework to adapt to this exploration paradigm
- To help users grasp complex data with rich semantics, we will work to automatically identify interesting categories of resources in a Semantic Web graph, based on the structure, semantics, and statistical properties of the graph.
Still in the presence of ontologies, we will investigate novel means of categorizing and presenting semantic answers to user queries, exploiting the reasoning involved in query answering.
- Novel data-intensive applications often involve highly heterogeneous datasets, typically produced at a rate which precludes “static” warehouse architectures. We devise new flexible querying paradigms, used to extract answers from the heterogeneous sources despite the diversity of the systems hosting them. We consider in particular applications to data-journalism and fact-checking.