Despite the increasing popularity of data-centric applications, there remains a significant mismatch between the needs of application developers and the design goals of modern databases. As a result, developers are often forced – or even encouraged by programming frameworks – to move part of the data management logic to the application layer: data integrity and consistency logic are implemented through validation checks in the application, rather than using the synchronization mechanisms offered by the database (e.g. transactions, locks, etc). Besides representing a duplication of effort, this programming pattern can engender subtle bugs due to the mismatch between the validation checks and the actual database abstractions to which these checks are mapped.
Research objectives and methods
This project aims at conducting a large-scale investigation of the use of data integrity mechanisms in open source applications. The goal of the project is to build a code mining tool to analyze applications written in the industry’s most popular programming language (i.e. Java) using a set of different ORM frameworks (e.g., JPA, Hibernate, etc.). We target a set of 50-100 open source applications belonging to different application domains and selected for their popularity. We intend to leverage state-of-the-art tools to perform static analysis of source code.
Furthermore, we plan to develop heuristics to infer the relation between the synchronization mechanisms and the specific application invariants they aim to enforce (e.g., all-or-nothing atomicity, existence or uniqueness of a certain datum, referential integrity, etc.).
This work will reveal useful to inform the design of future database systems, and to highlight the misuse of synchronization mechanisms in existing applications.
How to apply
The intern must:
- Be enrolled in a Masters’ in Computer Science / Informatics or a related field.
- Have an excellent academic record.
- Be strongly interested in, and have good knowledge of, distributed systems and/or software engineering.
- Be motivated by experimental research.
The internship is funded, and will take place in the Delys group, at Laboratoire d’Informatique de Paris-6 (LIP6), in Paris. It will be advised by Dr. Paolo Viotti and Dr. Marc Shapiro. A successful intern will be invited to apply for a PhD.
To apply, contact Paolo Viotti <email@example.com>, with the following information:
- A resume or Curriculum Vitæ.
- A list of courses and grades of the last two years of study (an informal transcript is OK).
- Names and contact details of two references (people who can recommend you), whom we will contact directly.
 P. Bailis, A. Fekete, M. J. Franklin, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Feral concurrency control: An empirical investigation of modern application integrity. In Int. Conf. on the Mgt. of Data (SIGMOD), 2015. http://doi.acm.org/10.1145/2723372.2737784.
 M. Shapiro, M. Saeida Ardekani, and G. Petri. Consistency in 3D. Rapport de Recherche RR-8932, Institut National de la Recherche en Informatique et Automatique (Inria), Rocquencourt, France, July 2016. https://hal.archives-ouvertes.fr/hal-01343592.