Return to Job offers

Masters’ internship: Developing a geo-replicated document store above AntidoteDB

Background

The document-based data model is a popular way of storing data, particularly for applications that handle semi-structured data or unstructured data accompanied by metadata, such as social media posts and multimedia. Document stores, including MongoDB and Apache CouchDB, store data object as JSON-like objects that have varying sets of fields, with different types for each field. This model is attractive as it makes application code easier to write and enables applications to change their database schema as demands evolve.
A basic functionality of document stores is that, beyond the simple key-to-document lookup interface, they offer a query language that allows users to retrieve documents by their semi-structured content. To achieve efficient and scalable semi-structured search, document stores maintain secondary indexes on document attributes.
AntidoteDB is a cloud database developed by the Delys team at LIP6. It provides geo-replication and guarantees both high availability and a high level of consistency.  AntidoteDB supports replicated data types such as counters, sets and maps that are designed to work correctly in the presence of concurrent updates and network failures.
Implementing a document store API backed by AntidoteDB can allow a geo-distributed deployment where users update their documents concurrently in multiple data centres with low latency and high availability.
More importantly, Antidote’s replicated data types can provide a flexible mechanism for users to explicitly control how conflicts caused by concurrent updates will be resolved, by choosing the appropriate data types based on their application semantics.

Research objectives and methods

The objective of this internship is to implement a document store interface and a query language that supports document retrieval by semi-structured content, based on AntidoteDB.  The query language should support point queries on text attributes and interval queries on numerical attributes, and allow complex queries including logical operators (AND, OR, NOT).  In order to achieve efficient and scalable search the system should maintain secondary indexes on document attributes.
The intern shall study different indexing techniques (eg inverted indexes, B-trees) [1], different strategies for organising distributed indexes (global or local indexes) [2], and different strategies for updating the index structures [3, 4], select the most appropriate ones for the system, implement the described interface and perform benchmarks.

How to apply

The intern must:

  • Be enrolled in a Masters’ in Computer Science / Informatics or a related field.
  • Have an excellent academic record.
  • Be strongly interested in, and have good knowledge of, distributed systems and/or distributed databases.
  • Be motivated by experimental research.

The internship is funded, and will take place in the Delys group, at Laboratoire d’Informatique de Paris-6 (LIP6), in Paris. It will be advised by Dimitrios Vasilas and Dr. Marc Shapiro. A successful intern will be invited to apply for a PhD.

To apply, contact Dimitrios Vasilas <dimitrios.vasilas@scality.com>, with the following information:

  • A resume or Curriculum Vitæ.
  • A list of courses and grades of the last two years of study (an informal transcript is OK).
  • Names and contact details of two references (people who can recommend you), whom we will contact directly.

Bibliography

  [1] Qader, Mohiuddin Abdul, Shiwen Cheng, Abhinand Menon and Vagelis Hristidis. “Efficient Secondary Attribute Lookup in Key-Value Stores.” (2015).
[2] Kejriwal, Ankita, Arjun Gopalan, Ashish Gupta, Zhihao Jia, Stephen Yang, and John K. Ousterhout. “SLIK: Scalable Low-Latency Indexes for a Key-Value Store.” In USENIX Annual Technical Conference, pp. 57-70. 2016.
[3] 
Tan, Wei, Sandeep Tata, Yuzhe Tang, and Liana L. Fong. “Diff-Index: Differentiated Index in Distributed Log-Structured Data Stores.” In EDBT, pp. 700-711. 2014.
[4] Tang, Yuzhe, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu, and Balaji Palanisamy. “Deferred lightweight indexing for log-structured key-value stores.” In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, pp. 11-20. IEEE, 2015.