Paolo Papotti: Core Mappings: Schema Mapping Revolution

Schema mappings are high-level specifications that describe the relationship between database schemas. They are an important tool in several areas of database research and have a central role in data exchange and data integration.

Research has investigated mappings under two perspectives. On one side, there are studies of practical tools for schema mapping generation (e.g., Clio at IBM Almaden). These works focus on algorithms to generate mappings based on visual specifications provided by users. On the other side, there are theoretical researches about data exchange. These study how to generate a solution – i.e., a target instance – given a set of mappings. In this context, the notion of a core of a data exchange solution has been formally identified as an optimal solution. However, until recently, the only way to produce core solutions were algorithms for the post processing of an intermediate materialization, since a mapping system supporting core computation was lacking.

In this talk I will start with a short history of schema mapping systems in recent times. I will then present algorithms that have contributed to bridge the gap between the practice of mapping generation and the theory of data exchange. I will focus on techniques to generate “core schema mappings”, that is, mappings that are able to materialize core solutions without post-processing computation. I will show that by using core schema mappings on top of common runtime engines, it is possible to achieve performances orders of magnitudes better than computing the core as a post-processing step. Finally, I will discuss an application of schema mappings in the specific context of the automatic extraction and integration of data from the Web. The talk ends with a discussion of current and future lines of research.

