Irini Fundulaki: Heuristic based Query Optimisation for SPARQL

14:00, Room 445 at PCRI

Abstract
During the last decade we have witnessed an increase in the amount of semantic data. The so called Web of Data extents the current Web to a global data space connecting data from diverse domains. A central issue in such setting is the efficient support for storing, querying, and manipulating semantic RDF data. In this work we focus on the problem of scalable processing and optimisation of semantic queries expressed in SPARQL using modern relational engines. Existing solutions heavily rely on statistics for the stored RDF graphs, and on cost-based planning algorithms. Extensive data statistics are quite expensive to compute and maintain for large scale and always evolving semantic data. The main challenge is to devise heuristic-based query
optimisation techniques generating near to optimal execution plans without any knowledge of the underlying datasets.

Permanent link to this article: https://team.inria.fr/oak/2011/10/21/irini-fundulaki-heuristic-based-query-optimisation-for-sparql/

Open Data presentation in the Digiteo forum 2011

The yearly Digiteo Forum was held on October 18. Ioana Manolescu gave a plenary talk on Open Data.

Permanent link to this article: https://team.inria.fr/oak/2011/10/18/open-data-presentation-in-the-digiteo-forum-2011/

Leo in the INRIA evaluation seminary

The team has been just presented, not formally evaluated, because at this stage we are only a team and not a project-team. The slides can be found here.

Permanent link to this article: https://team.inria.fr/oak/2011/10/12/leo-in-the-inria-evaluation-seminary/

Ioana Manolescu, Unithé ou café: “Toutes les données du monde en un seul clic”

1er étage, bât. I (labo Inria/ Microsoft Research), Parc Orsay Université

Abstract
Vous pensez déjà à vos futures vacances au ski, et cherchez un village alliant une altitude moyenne, un bon ensoleillement et des activités pour les enfants ? Du coup, vous voilà obligé de jongler entre les différents tableaux et sites d’informations. Bientôt ce sera de l’histoire ancienne ! Nous verrons en effet que les tableaux Excel équivalent à la préhistoire des bases de données. Les solutions ? Tout d’abord XML, où des balises sont définies au départ pour permettre la communication entre ordinateurs. Et aussi RDF, le socle du web sémantique, qui vise à ce que les informations se recoupent et se consolident d’elles mêmes, pour nous faciliter l’accès à toutes les connaissances du monde.

You can find the slides of the talk here.

Permanent link to this article: https://team.inria.fr/oak/2011/10/07/ioana-manolescu-unithe-ou-cafe-toutes-les-donnees-du-monde-en-un-seul-clic/

Leo yearly meeting

15:30, room 455 at PCRI

The slides can be found here.

Permanent link to this article: https://team.inria.fr/oak/2011/10/07/leo-yearly-meeting/

Zoi Kaoudi: Distributed RDF Query Processing and Reasoning in Peer-to-Peer Networks

“Distributed RDF Query Processing and Reasoning in Peer-to-Peer Networks”
Friday, October 7, 2011
14:30, room 455 at PCRI

Abstract
With the interest in Semantic Web applications rising rapidly, the Resource Description Framework (RDF) and its accompanying vocabulary description language, RDF Schema (RDFS), have become one of the most widely used data models for representing and integrating structured information in the Web. With the vast amount of available RDF data sources on the Web increasing rapidly, there is an urgent need for RDF data management. In this work, we focus on distributed RDF data management in peer-to-peer (P2P) networks. More specifically, we present results that advance the state-of-the-art in the research area of distributed RDF query processing and reasoning in P2P networks. We fully design and implement a P2P system, called Atlas, for the distributed query processing and reasoning of RDF and RDFS data. Atlas is built on top of distributed hash tables (DHTs), a commonly-used case of P2P networks. Initially, we study RDFS reasoning algorithms on top of DHTs. We design and develop distributed forward and backward chaining algorithms, as well as an algorithm which works in a bottom-up fashion using the magic sets transformation technique. We study theoretically the correctness of our reasoning algorithms and prove that they are sound and complete. We also provide a comparative study of our algorithms both analytically and experimentally. In the experimental part of our study, we obtain measurements in the realistic large-scale distributed environment of PlanetLab as well as in the more controlled environment of a local cluster. Moreover, we propose algorithms for SPARQL query processing and optimization over RDF(S) databases stored on top of distributed hash tables. We fully implement and evaluate a DHT-based optimizer. The goal of the optimizer is to minimize the time for answering a query as well as the bandwidth consumed during the query evaluation. The optimization algorithms use selectivity estimates to determine the chosen query plan. Our algorithms and techniques have been extensively evaluated in a local cluster.

You can find the slides of the talk here.

Permanent link to this article: https://team.inria.fr/oak/2011/10/07/zoi-kaoudi-distributed-rdf-query-processing-and-reasoning-in-peer-to-peer-networks/

Dario Colazzo HDR: Schemas for safe and efficient XML processing

Room 435, PCRI

Abstract
Ce manuscrit d’Habilitation à Diriger des Recherches présente des résultats que j’ai obtenus dans le cadre d’activités de recherche menées depuis 2005 en tant que Maître de Conférences à l’Université Paris-Sud XI. Au début de cette période XML ( eXtensible Markup Language) était déjà reconnus comme le standard pour la représentation de données semi structurées. En même temps, XML c’est aussi affirmé comme format de représentation dans le contexte de l’intégration et l’échange de données. Pendant cette période mes intérêts de recherche se sont situés à la confluence des langages des bases de données et langages de programmation, et se sont focalisé sur l’utilisation des systèmes de types pour assurer la sureté et optimisation des programmes manipulant les données XML. Plus en détails, je me suis principalement intéressé à trois axes de recherche: i) optimisation de requêtes et mise à jours XML via la projection de données, ii) vérification de la correction des mappings entre deux schémas XML, iii) algorithmes efficaces pour la vérification d’inclusion entre schémas XML (une propriété qui est à la base des systèmes de types pour requêtes et mises à jour XML). Ce manuscrit d’Habilitation à Diriger des Recherches est consacré à ces trois axes de recherche, et présente le contexte, les motivations et résultats obtenus pour chacun des axes.

Permanent link to this article: https://team.inria.fr/oak/2011/09/08/dario-colazzo-hdr-schemas-for-safe-and-efficient-xml-processing/

VLDB 2012: View Selection in Semantic Web Databases

View Selection in Semantic Web Databases, by Francois Goasdoué, Konstantinos Karanasos, Julien Leblay and Ioana Manolescu. PVLDB, Vol. 5, Oct. 2011

Permanent link to this article: https://team.inria.fr/oak/2011/08/18/paper-accepted-at-vldb-2012/

Jesus Camacho-Rodriguez and Danai Simeonidou will join as PhD students

They have both obtained PhD scholarships from Universite de Paris Sud. Congrats and y’a plus qu’a!

Permanent link to this article: https://team.inria.fr/oak/2011/07/11/jesus-camacho-rodriguez-and-danai-simeonidou-will-join-as-phd-students/

VLDS 2011: Growing Triples on Trees: an XML-RDF Hybrid Model for Annotated Documents

“Growing Triples on Trees: an XML-RDF Hybrid Model for Annotated Documents”(F. Goasdoue, K. Karanasos, Y. Katsis, J. Leblay, I. Manolescu and S. Zampetakis) has been accepted to theVery Large Data Search (VLDS) workshop.

Permanent link to this article: https://team.inria.fr/oak/2011/07/11/short-paper-accepted-at-vlds-2011/