The PAXQuery project demonstrates at EIT ICT Labs Results Day!

As one of the EIT ICT Labs projects in 2014, the team behind PAXQuery (Ioana Manolescu, Dario Colazzo, Jesús Camacho Rodríguez and Juan Álvaro Muñoz Naranjo) had the pleasure to prepare a demo for the EIT ICT Labs Results Day held at Helsinki on December 4th. Once there, they had the chance to introduce PAXQuery to approximately 300 attendees from both the scientific and industrial communities. It is worth mentioning that the Flink platform (on which PAXQuery relies) was the most present with up to five different projects demonstrated.

Thanks to EIT-ICT Labs for a great event!


Permanent link to this article:

Yannis Velegrakis: On building more human query answering systems

When: Thursday, December 4, at 10.00
Where: PCRI building, room 455
Title: On building more human query answering systems

The underlying principle behind every query answering system is the existence of a query describing the information of interest. When this model is applied to non-expert users, two traditional issues become highly significant.
The first is that many queries are often over specified leading to empty answers. We propose a principled optimization-based interactive query relaxation framework for such queries. The framework computes dynamically and suggests alternative queries with less conditions to help the user arrive at a query with a non-empty answer, or at a query for which it is clear that independently of the relaxations the answer will always be empty.
The second issue is the lack of expertise from the user to accurately describe the requirements of the elements of interest. The user may though know examples of elements that would like to have in the results. We introduce a novel form of query paradigm in which queries are not any more specifications of what the user is searching for, but simply a sample of what the user knows to be of interest. We refer to this novel form of queries as Exemplar Queries.

Short bio:
Yannis Velegrakis is an associate professor at the Department of Information Engineering and Computer Science of the University of Trento, and the leader of the Data and Information Management group. He holds a PhD degree in Computer Science from the University of Toronto. His research areas of expertise include large scale integration of highly heterogeneous and distributed data, efficient and effective query answering, social data analytics, and Big Data. Prior to joining the University of Trento, he was a researcher at the AT&T Research Labs in the United States. He has also spent time as a visitor at the University of California, Santa-Cruz, the IBM Almaden Research Center, and the Center of Advanced Studies of the IBM Toronto Lab. He was a member of the committee for the CIMI cultural profile of the ANSI/NISO Z39.50 standard. He has served in many program committees of national and international conferences and as reviewer for numerous international journals. He has been a general chair for VLDB 2013, WebDB 2012, DESWEB 2010/11 and SWAE2007. He holds 2 US patents and has been a Marie Curie Fellow for the period 2006-2008.

Permanent link to this article:

Konstantinos Karanasos: Dynamically Optimizing Queries over Large Scale Data Platforms

When: Wednesday, November 26, at 11.00
Where: PCRI building, room 445
Title: Dynamically Optimizing Queries over Large Scale Data Platforms
Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain actionable insights from their “big data”. Query optimization is still an open challenge in this environment due to the volume and heterogeneity of data, comprising both structured and un/semi-structured datasets. Moreover, it has become common practice to push business logic close to the data via user-defined functions (UDFs), which are usually opaque to the optimizer, further complicating cost-based optimization. As a result, classical relational query optimization techniques do not fit well in this setting, while at the same time, suboptimal query plans can be disastrous with large datasets. In this talk, I will present new techniques that take into account UDFs and correlations between relations for optimizing queries running on large scale clusters. We introduce “pilot runs”, which execute part of the query over a sample of the data to estimate selectivities, and employ a cost-based optimizer that uses these selectivities to choose an initial query plan. Then, we follow a dynamic optimization approach, in which plans evolve as parts of the queries get executed. Our experimental results show that our techniques produce plans that are at least as good as, and up to 2x (4x) better for Jaql (Hive) than, the best hand-written left-deep query plans.
This work was done while I was with IBM Research at Almaden and appeared in SIGMOD 2014. It is a joint work with Andrey Balmin, Marcel Kutsch, Fatma Ozcan, Vuk Ercegovac, Chunyang Xia, and Jesse Jackson.

Permanent link to this article:

ISI 2014: “Analyse de données RDF. Lentilles pour graphes sémantiques”

Analyse de données RDF. Lentilles pour graphes sémantiques
by Dario Colazzo, François Goasdoué, Ioana Manolescu and Alexandra Roatiș
appeared in Ingénierie des Systèmes d’Information,  vol. 19/4, 2014, pp. 87-117

Permanent link to this article:

ICDE 2015: CliqueSquare: Optimizing RDF Queries for Parallel Execution

The demonstration “CliqueSquare: Optimizing RDF Queries for Parallel Execution” by Benjamin Djahandideh, François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge Quiané Ruiz and Stamatis Zampetakis has been accepted for publication at ICDE 2015.

Permanent link to this article:

Tamer Özsu: Web Data Management in the RDF Age

When: Friday, October 24, at 14.00

Where: PCRI building, room 445

Who: Dan Olteanu, associate professor at University of Oxford

Title: *Yo Dawg, We Heard You Like Datalog Engines* … so we put a Datalog engine inside your Datalog engine, so you can derive while you derive!

The emerging category of smart database systems aims for integrated handling of mixed transactional and analytical workloads (aka HTAP), graph analyses, and predictive workloads that involve mathematical optimization and machine learning. Having such an integrated system is enormously useful for application developers, but building one poses formidable engineering and language design challenges. In this talk, I briefly overview the LogicBlox smart database system and its Datalog-based language LogiQL, then zoom in and focus on one such technical challenge I’ve worked on in my sabbatical at LogicBlox: the problem of handling updates to LogiQL programs on running database servers. This problem turns out to be surprisingly difficult, but fairly crucial to solve properly in the system for reasons which I shall explain. The solution I present is based on introducing an engine for meta-data supporting declarative rules in an object-oriented, Datalog-like language. Incremental view maintenance in the meta-engine takes care of propagating the effects of LogiQL code updates correctly and efficiently. This is joint work with TJ Green, Todd Veldhuizen, and the LogicBlox runtime team.

Permanent link to this article:

OAK at BDA 2014

The BDA conference celebrated its 30th anniversary in the beautiful scenery of Autrans.


It featured three captivating keynotes given by renowned invited speakers:

  • « Big Data Integration », Divesh Srivastava (AT&T Labs)
  • « Big Data: Hype and Reality », Dr C. Mohan (IBM Almaden Research Center)
  • « Declarative Modeling for Machine Learning and Data Mining », Luc De Raedt (Katholieke Universiteit Leuven)

And also a workshop with practical information tailored particularly for young researchers:

  • « Hints on publication: the story of Ike Antkare », Cyril Labbé (LIG Lab, Université Joseph Fourier)
  • « The life of a researcher : a personal viewpoint », Serge Abiteboul (INRIA & ENS Cachan)

OAK members presented a great variety of works.


Dario presented two papers “PigReuse: Reuse-based Optimization for Pig Latin” (Jesús Camacho-Rodríguez, Dario Colazzo, Melanie Herschel, Ioana Manolescu and Soudip Roy Chowdhury) and “PAXQuery: Efficient Parallel Processing of Complex XQuery” (Jesús Camacho-Rodríguez, Dario Colazzo and Ioana Manolescu), with Juan assisting him with a small demo on the second work.

Damian presented “Reformulation-based Query Answering in RDF” (Damian Bursztyn, François Goasdoué, Ioana Manolescu and Alexandra Roatiș), and Katerina talked about “Immutably Answering Why-Not Questions for Equivalent Conjunctive Queries” (Nicole Bidoit, Melanie Herschel and Katerina Tzompanaki).

Even though they’re not official OAK members, I could not resist including Danai and Virginie who also presented their works, “SAKey: Scalable Almost Key discovery in RDF data” (Danai Symeonidou, Vincent Armant, Nathalie Pernelle and Fatiha Saïs), respectively “Une algèbre floue pour l’interrogation flexible de bases de données graphe” (Olivier Pivert, Virginie Thion, Helene Jaudoin and Grégory Smits).

Finally, the popularity of the demo “How to deal with Cliques at Work” (Benjamin Djahandideh, François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge-Arnulfo Quiané-Ruiz and Stamatis Zampetakis) presented by Stamatis and Benjamin was so distracting that we forgot to take a photo. We consider them victims of their own success.

After the intellectual program, the conference featured some physical training.

It is with great pleasure that we introduce OAK Covert Ops:


While a team made with Benjamin and Stamatis was the initial intimidator, it turns out that Danai, Katerina and Soudip were the ones killing all the competition. Best not to mess with these young researchers!

With a final image we say good bye to BDA 2014. We look forward to the next year!


Permanent link to this article:

ICDE 2015: CliqueSquare: Flat Plans for Massively Parallel RDF Queries

“CliqueSquare: Flat Plans for Massively Parallel RDF Queries” by François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge Quiané-Ruiz and Stamatis Zampetakis has been accepted for publication in ICDE 2015.

Permanent link to this article:

ICDE 2015 tutorial: RDF Data Management: Reasoning on Web Data

The tutorial “RDF Data Management: Reasoning on Web Data” by François Goasdoué, Ioana Manolescu and Alexandra Roatis has been accepted for publication at ICDE 2015.

Permanent link to this article:

CIDR 2015: Invisible Glue: Scalable Self-Tuning Multi-Stores

“Invisible Glue: Scalable Self-Tuning Multi-Stores” by Francesca Bugiotti, Damian Bursztyn, Alin Deutsch, Ioana Ileana and Ioana Manolescu has been accepted for publication in CIDR 2015.

Permanent link to this article: