Return to Projects

PAXQuery: efficient parallel processing of XQuery


The PAXQuery engine seamlessly parallelizes the execution of XQuery queries. By applying on-the-fly translation and optimization procedures, PAXQuery runs user queries over massive collections of XML documents in a distributed fashion. PAXQuery runs on top of Apache Flink, previously known as Stratosphere, a parallel execution platform that relies on the PACT model.

After the user inputs the XQuery query, the engine builds an equivalent tree of algebraic operators that works on nested tuples. The set of operators includes navigation, group by, aggregation, selection, projection, and many others.

Once the tree is built and optimized, the engine compiles it into a PACT plan consisting of implicit parallel operators such as Map, Reduce, Match, CoGroup, or Cross. The result is given to the Apache Flink platform, which is responsible for the PACT plan optimization and its parallel execution e.g. over HDFS or the local filesystem.

PAXQuery Architecture

Open-source release

PAXQuery is open-source and can be found here. If you would like to get involved, send us a message!

People involved (listed in alphabetical order)



This project has been partially funded by the ICTLabs of the European Institute of Innovation and Technology.

Permanent link to this article: