PhD defense of Abdoul MACINA

December 17th


SPARQL Distributed Query Processing over Linked Data



Driven by the Semantic Web standards, an increasing number of RDF data sources are
published and connected over the Web by data providers, leading to a large distributed
linked data network. However, exploiting the wealth of these data sources is very
challenging for data consumers considering the data distribution, their volume growth
and data sources autonomy. In the Linked Data context, federation engines allow
querying these distributed data sources by relying on Distributed Query Processing
(DQP) techniques. Nevertheless, a naive implementation of the DQP approach may
generate a tremendous number of remote requests towards data sources and numerous
intermediate results, thus leading to costly network communications. Furthermore, the
distributed query semantics is often overlooked. Query expressiveness, data partitioning,
and data replication are other challenges to be taken into account. To address these
challenges, we first proposed in this thesis a SPARQL and RDF compliant Distributed
Query Processing semantics which preserves the SPARQL language expressiveness.
Afterwards, we presented several strategies for a federated query engine that
transparently addresses distributed data sources, while managing data partitioning,
query results completeness, data replication, and query processing performance. We
implemented and evaluated our approach and optimization strategies in a federated
query engine to prove their effectiveness.


Thesis committee:

  • Esther Pacitti, Reviewer, University of Montpellier
  • Hala Skaf-Molli, Reviewer, University of Nantes
  • Andrea Tettamanzi, University of Nice Sophia Antipolis
  • Oscar Corcho, Examiner, University of Madrid
  • Olivier Corby, Adviser, Inria
  • Johan Montagnat, Adviser, CNRS

Comments are closed.