14.00, room 455, PCRI
Abstract
An increasing part of the world’s data is either shared through the Web or directly produced through and for Web platforms, in particular using structured formats like XML. Cloud platforms are interesting candidates to handle large data (and document) repositories, due to their elastic scaling properties. We present NubeX, a scalable store for managing a large corpora of XML documents built on top of off-the-shelf cloud infrastructure. NubeX implements smart indexes to both speed up query evaluation and reduce the warehouse operating cost, by helping direct queries to the documents which may have answers to them. We present the architecture, indexing strategies, and extensive experiments demonstrating their performance.