CIKM 2012: The Nautilus Analyzer – Understanding and Debugging Data Transformations

The Nautilus Analyzer – Understanding and Debugging Data Transformations
by Melanie Herschel and Hanno Eichelberger
Demonstration in ACM CIKM 2012

Permanent link to this article: https://team.inria.fr/oak/2012/08/14/cikm-2012-the-nautilus-analyzer-understanding-and-debugging-data-transformations/

CIKM 2012: AMADA: Web Data Repositories in the Amazon Cloud

AMADA: Web Data Repositories in the Amazon Cloud
by Andrés Aranda-Andújar, Francesca Bugiotti, Jesús Camacho-Rodríguez, Dario Colazzo, François Goasdoué, Zoi Kaoudi and Ioana Manolescu
Demonstration in ACM CIKM 2012

Permanent link to this article: https://team.inria.fr/oak/2012/08/05/cikm-2012-amada-web-data-repositories-in-the-amazon-cloud/

Asterios Katsifodimos: “ViP2P: Efficient XML Management in DHT Networks”

Asterios will do a rehearsal of the talk he will soon give at ICWE 2012 on Thursday, July 19, at 11am, at room 445.

Abstract
We consider the problem of efficiently sharing large volumes of XML data based on distributed hash table overlay networks. Over the last three years, we have built ViP2P (standing for Views in Peer- to-Peer), a platform for the distributed, parallel dissemination of XML data among peers. At the core of ViP2P stand distributed materialized XML views, defined as XML queries, filled in with data published any- where in the network, and exploited to efficiently answer queries issued by any network peer. ViP2P is one of the very few fully implemented P2P platforms for XML sharing, deployed on hundreds of peers in a WAN. This paper describes the system architecture and modules, and the engineering lessons learned. We show experimental results, showing that our choices, outperform related systems by orders of magnitude in terms of data volumes, network size and data dissemination throughput.

Permanent link to this article: https://team.inria.fr/oak/2012/07/19/asterios-katsifodimos-vip2p-efficient-xml-management-in-dht-networks/

BDA 2012: AMADA: Web Data Repositories in the Amazon Cloud

AMADA: Web Data Repositories in the Amazon Cloud
by Andrés Aranda-Andújar, Francesca Bugiotti, Jesús Camacho-Rodríguez and Zoi Kaoudi
Demonstration in BDA 2012

Permanent link to this article: https://team.inria.fr/oak/2012/07/12/bda-2012-amada-web-data-repositories-in-the-amazon-cloud/

EIT ICT Labs MSDA 2013 accepted

The Massive Shared Data Applications activity, led by Sandro Battisti from Trento RISE, and to which we participate, is accepted for 2013.

Permanent link to this article: https://team.inria.fr/oak/2012/07/03/eit-ict-labs-msda-2013-accepted/

EIT ICT Labs Europa 2013 accepted

The Europa EIT ICT Labs activity, led by Volker Markl from TU Berlin, and to which we participate, is accepted for 2013.

Permanent link to this article: https://team.inria.fr/oak/2012/07/03/eit-ict-labs-europa-2013-accepted/

Stamatis Zampetakis gets an Inria scholarship

Stamatis has obtained an Inria CORDI scholarship for a three years PhD, starting in October 2012. Congrats!

Permanent link to this article: https://team.inria.fr/oak/2012/07/03/stamatis-zampetakis-gets-an-inria-scholarship/

XLDI 2012: Typing Massive JSON Datasets

Typing Massive JSON Datasets
by Dario Colazzo, Giorgio Ghelli and Carlo Sartiani
in the First International Workshop on Cross-model Language Design and Implementation (XLDI 2012)

Permanent link to this article: https://team.inria.fr/oak/2012/07/03/xldi-2012-typing-massive-json-datasets/

IDEAS 2012: Partitioning XML Documents for Iterative Queries

Partitioning  XML Documents for Iterative Queries
by Nicole Bidoit, Dario Colazzo, Noor Malla and Carlo Sartiani
in IDEAS 2012

Permanent link to this article: https://team.inria.fr/oak/2012/07/02/ideas-2012-partitioning-xml-documents-for-iterative-queries/

PhD defense of Konstantinos Karanasos

11.00, room 455, PCRI

Title: “View-Based Techniques for the Efficient Management of Web Data”

Abstract:

Data is being published in digital formats at very high rates nowadays. A large share of this data has complex structure, typically organized as trees (Web documents such as HTML and XML being the most representative) or graphs (in particular, graph-structured Semantic Web databases, expressed in RDF). There is great interest in exploiting such complex data, whether in an Open Data access model or within companies owning it, and efficiently doing so for large data volumes remains challenging.
     Materialized views have long been used to obtain significant performance improvements when processing queries. The principle is that a view stores pre-computed results that can be used to evaluate (possibly part of) a query. Adapting materialized view techniques to the Web data setting we consider is particularly challenging due to the structural and semantic complexity of the data. This thesis tackles two problems in the broad context of materialized view-based management of Web data.
     First, we focus on the problem of view selection for RDF query workloads. We present a novel algorithm, which, based on a query workload, proposes the most appropriate views to be materialized in the database, in order to minimize the combined cost of query evaluation, view maintenance and view storage. Although RDF query workloads typically feature many joins, hampering the view selection process, our algorithm scales to hundreds of queries, a number unattained by existing approaches. Furthermore, we propose new techniques to account for the implicit data that can be derived by the RDF Schemas and which further complicate the view selection process.
     The second contribution of our work concerns query rewriting based on materialized XML views. We start by identifying an expressive dialect of XQuery, corresponding to tree patterns with value joins, and study some important properties for these queries, such as containment and minimization. Based on these notions, we consider the problem of finding minimal equivalent rewritings of a query expressed in this dialect, using materialized views expressed in the same dialect, and provide a sound and complete algorithm for that purpose. Our work extends the state of the art by allowing each pattern node to return a set of attributes, supporting value joins in the patterns, and considering rewritings which combine many views. Finally, we show how our view-based query rewriting algorithm can be applied in a distributed setting, in order to efficiently disseminate corpora of XML documents carrying RDF annotations.

Permanent link to this article: https://team.inria.fr/oak/2012/06/29/phd-defense-of-konstantinos-karanasos/