Internship 2023–2024: Rigorous development of database backends

Internship 2023–2024: Rigorous development of database backends

Keywords

Distributed systems; distributed databases; correct-by-construction software; concurrency

Background

Conceptually, a database storage backend is just a map of keys to values. However, to provide performance and reliability, a modern store is a complex, concurrent software system, opening many opportunities for bugs. Using a verification-based approach has the potential to avoid such bugs, but specifying the complexity of a modern backend upfront is a daunting task. To decompose the problem into smaller pieces, we base our approach on the following insights: (1) There are two broad classes of backends: eager, state-based maps, vs. lazy, operation-based journals. (2) We can write a high-level, abstract specification that accounts for both. (3) We can prove correctness and equivalence properties from the high-level specification. (4) The specification provides a kind of pseudocode for the implementation. (5) A similar high-level specification accounts for the composition of primitive stores. (6) We can express the complex internal architecture of a modern backend as such a composition.

Research perspectives

The project has so far tackled items 1 and 2 above, and we are making made good progress towards item 3. Work on item 4, implementing the primitive stores by translating the specification verbatim, is well advanced. An initial goal of the internship is to perform an experimental study (of performance and fault tolerance) of the implementation. The focus of the work will tackle items 5 and 6. The intern will design and implement a composition framework, and will use composition to mimic the functionality of an modern backend such as RocksDB. The research objective is to evaluate the backend designed following our rigorous approach, against comparable backends designed in an ad-hoc manner. We expect that ours will have comparable performance and smaller code footprint than the latter.

Requirements and application

The internship research will be advised by Marc Shapiro, of Sorbonne-Université and Inria. It will take place in Paris, France. Speaking French is not required.

The candidate should have some knowledge about databases and concurrent systems.

The candidate should also have a basic experience in software programming in Java and usage of collaborative tools such as git, as well as experimentation and measurement in distributed systems.

Please contact the advisor Marc Shapiro <marc.shapiro@acm.org>, with the following information:

  • A resume or Curriculum Vitæ.
  • A motivation letter.
  • A list of courses and grades of the last two years of study (an informal transcript is OK).

Comments are closed.