Armin Roth: Efficient Query Answering in Peer Data Management Systems

13:30, Room N107 (Parc Club)


Peer Data Management Systems (PDMS) consist of a volatile set of peers. Each of them answers queries against its own schema by exploiting both local data and by passing queries to neighboring peers along so-called schema mappings. PDMS are highly flexible due to their decentral nature, but query answering has only limited scalability due to the massive redundancy in the paths along which queries get routed. Additionally, repeated query rewriting often leads to increasing information loss.

Our work is based on the idea to trade completeness of query answers for speed of execution, thus turning completeness from a requirement into an optimization goal. To this end, peers can prune those paths during query answering for which they estimate a bad cost/benefit ratio. However, estimating this ratio in highly distributed systems as PDMS is difficult. We present a technique based on self-adaptive multidimensional histograms that are updated by exploiting the queries passing through the network. Based on these histograms, we present several techniques to trade benefit with cost. One approach limits the time budget available for query answering. An orthogonal strategy exploits statistics on overlap between data to reduce redundancy in query processing. Experiments with our self-developed PDMS “System P” show efficiency gains of an order of magnitude or more.

Short Bio

Armin Roth is an external doctorate candidate at the Humboldt-Universität zu Berlin, Germany. His advisors are Ulf Leser and Felix Naumann from the Hasso Plattner Institut in Potsdam, Germany. Armin joined the IBM Lab in Boeblingen, Germany as a development engineer in 2009. He contributes to the IBM Infosphere Information Server. He received his diploma in mechanical engineering from Universität Stuttgart, Germany and finished postgraduate studies on practical computer science at FernUniversität Hagen, Germany. His focus is on information integration and data quality, both in industry and academia.

During this talk, he will focus on his research performed independently from IBM.

Permanent link to this article:

Leave a Reply