Konstantinos Karanasos: View Selection for Efficient and Scalable RDF Data Management

joint work with Francois Goasdoue and Ioana Manolescu

Abstract

The view selection problem consists of choosing, given a set of queries, a set of views to materialize, in order to minimize the total cost of evaluating these queries and of maintaining the views. Several variants of the problem exist, encompassing the existence of a possible space constraint concerning the total size of the materialized views, or giving weights to each query, etc. This particular problem has been thoroughly examined for relational databases.

The increasing popularity of the RDF data model in Semantic Web and Web 2.0 applications, has turned the efficient evaluation of queries over large volumes of RDF data of paramount importance.

In this work, we address the view selection problem over large amounts of RDF data, given a set of conjunctive queries over this data. We characterize the search space associated to
this problem and define a set of transformation rules over the states. Various algorithms are presented for searching the search space.

Permanent link to this article: https://team.inria.fr/oak/2009/11/27/seminar-konstantinos-karanasos/

Dimitri Theodoratos: Processing and Efficient Evaluation of Generalized Tree-Pattern Queries

14:00, Room G008 (Parc Club)

Abstract

Current applications export and exchange XML data on the web. Usually, XML data are queried using keyword queries or using the standard structured query language XQuery whose core consists of the navigational query language XPath. In this context, one major challenge is the querying of the data when the structure of the data sources is complex or not fully known to the user.

Another challenge is the integration of multiple data sources that export data with structural differences and irregularities. In order to deal with these challenges, we consider a query language for XML which generalizes and strictly contains Tree-Pattern Queries (TPQs) and can express a broad structural fragment of XPath. Because of the expressive power and flexibility of this language, processing and evaluating queries pose new problems.

In this presentation, we will discuss recent results with respect to these issues.

Permanent link to this article: https://team.inria.fr/oak/2009/11/20/seminar-dimitri-theodoratos/

Paolo Papotti: Core Mappings: Schema Mapping Revolution

14:00, Room G008 (Parc Club)

Abstract

Schema mappings are high-level specifications that describe the relationship between database schemas. They are an important tool in several areas of database research and have a central role in data exchange and data integration.

Research has investigated mappings under two perspectives. On one side, there are studies of practical tools for schema mapping generation (e.g., Clio at IBM Almaden). These works focus on algorithms to generate mappings based on visual specifications provided by users. On the other side, there are theoretical researches about data exchange. These study how to generate a solution – i.e., a target instance – given a set of mappings. In this context, the notion of a core of a data exchange solution has been formally identified as an optimal solution. However, until recently, the only way to produce core solutions were algorithms for the post processing of an intermediate materialization, since a mapping system supporting core computation was lacking.

In this talk I will start with a short history of schema mapping systems in recent times. I will then present algorithms that have contributed to bridge the gap between the practice of mapping generation and the theory of data exchange. I will focus on techniques to generate “core schema mappings”, that is, mappings that are able to materialize core solutions without post-processing computation. I will show that by using core schema mappings on top of common runtime engines, it is possible to achieve performances orders of magnitudes better than computing the core as a post-processing step. Finally, I will discuss an application of schema mappings in the specific context of the automatic extraction and integration of data from the Web. The talk ends with a discussion of current and future lines of research.

Permanent link to this article: https://team.inria.fr/oak/2009/11/03/seminar-paolo-papotti/