We are pleased to announce the source code release of CliqueSquare, an RDF data management system based on Hadoop.
CliqueSquare is a system for storing and querying large RDF graphs relying on Hadoop’s distributed file system (HDFS) and Hadoop’s MapReduce open-source implementation. It provides a novel partitioning and storage scheme that permits 1-level joins to be evaluated locally using efficient map-only joins. In addition, CliqueSquare is equipped with a unique optimization algorithm based on graphs and cliques capable of generating highly parallelizable flat query plans relying on n-ary equality joins.
The system is described in an upcoming ICDE 2015 paper as well as an ICDE 2015 demonstration (see https://team.inria.fr/oak/
* Scalable RDF storage using novel partitioning algorithms specially designed for Hadoop and HDFS that take into account the peculiarities of the RDF structure to reduce query-generated network traffic
* Scalable processing of SPARQL Basic Graph Pattern (BGP) queries relying on:
(i) novel optimization algorithms aiming to produce highly parallelizable query plans;
(ii) efficient MapReduce physical operators maximizing the usage of the Hadoop cluster.
Minimum system requirements
* Hadoop 1.2.1
* Linux / Mac OS
* Java 6
The initial release of CliqueSquare is available at:
Feature to be added soon: support for grouping and aggregation
The CliqueSquare Team