Return to Job offers

PhD topic: High-level distributed computing

Keywords

Distributed system; distributed programming; distributed database system; reactive programming; programming language and verification.

Background

Distributed computing has become a crucial technique across many areas of computing. Programming highly-available distributed applications is no longer reserved to expert programmers. Consider for instance, mobile computing, Internet of Things (IoT), high-performance computing, Network Function Virtualisation (NFV), neural networks, or internet gaming.

However, distributed programming remains difficult and error-prone, exposing users, the economy, and critical infrastructure to bugs and security violations. Indeed, concurrency and failures, essential features of a distributed system, are difficult to abstract away. Interacting concurrent processes do not compose well. Furthermore, any large-scale distributed system suffers undetectable failures and processes cannot reliably reach consensus on a single (“strongly consistent”) up-to-date view of shared state; this is a fundamental result of distributed system theory (the FLP and CAP theorems).

Furthermore, applications have conflicting requirements. On the one hand, correctness (controlling what the system does), requires events to happen in a reliable, deterministic way. On the other, application performance (including availability, responsiveness and throughput), requires concurrent, asynchronous execution. There is no single right solution to this trade-off; it depends on the application requirements, the expected environment and workload, the available resources, etc. A promising direction is a hybrid approach, where updates avoid coordination by default, but specific operations that are essential to application correctness are synchronised. Getting this right is difficult: current practice in building distributed systems rests on programmer expertise, i.e., trial and error, which is costly and dangerous. Thus, currently, large numbers of non-expert programmers to play it by ear among uncomfortable and momentous trade-offs, in the presence of non-composable, non-deterministic, and weak consistency.

Proposed research

We believe that the situation is ripe for a new, high-level approach. We propose to develop methods, tools and languages to aid the programmer of general distributed programs.

Highly successful and explicative abstractions already exist, such as (at opposite ends of the spectrum) consensus or data flow. Frameworks and languages are making distributed programming easily accessible in some restricted domains, for instance MapReduce, Spark or TensorFlow. These approaches work well in their restricted environment, but only by severely restricting the developer’s capabilities.

The closest to a general-purpose programming environment is Orleans [4], which allows the developer to define and compose abstract, location-independent. The runtime environment is in charge of connecting them together and of deploying them, elastically as the service demands and the availability of resources change over time. Another important concept is dataflow or reactive programming, whose main abstraction is a graph whose edges carry flows of information, and whose vertices are computation entities [2]. Finally, tierless programming describes distributed computation subsuming how sub-computations are deployed and placed at the different tiers of a cloud computing environment [3].

These approaches provide orthogonal computation, composition, communication and deployment abstractions, and are designed to maximise parallelism and to enable flexibility and elasticity. However, in order to hide the complexity of distribution to application developers, they come with arbitrary restrictions; for instance, communication abstractions are often unidirectional. Deployment, consistency, security and fault-tolerance are assumed addressed by a separate system, not programmable with the same first-class abstractions.

We believe that it is possible to take a similar approach for both application and system developers themselves. It should be possible to create and protect abstractions, including what is built-in or second-class in the above systems, by composing basic primitives that provide access to the full power of distribution. It should be possible, for instance, to build the equivalent of Antidote [1] or Orleans with flexible consistency levels and a pluggable, non-compromisable security architecture.

Such primitives might include:

  1. Shared and persistent data objects, with the capability to implement replication and versioning.
  2. Asynchronous (concurrent) and synchronous (consensus-based) operation invocation, with the capability to provide transactional and causal consistency guarantees.
  3. Publish-subscribe/data flow, with forward and backward paths, and combiners. Data flow carries any mixture of state, delta, or operation. Communication respects programmer-defined abstraction boundaries.
  4. Transparent piggy-backing of metadata, such as timestamps, provenance, security labels, or accounting information.
  5. Programmable deployment and elastic configuration of computation and data entities, transparently to their functional program text.

At the same time, our approach helps avoid many of the opportunities for error, by focusing on the essential properties of application correctness. It is often the invariants required over application data that dictates the protocol for accessing the data; this is an intuition that programmers commonly apply. Hence, we aim to apply leverage language and verification tools, to aid the programmer in choosing the best consistency level and in synthesising a program that respects its specification.

Requirements and application

The research has both a fundamental and an applied aspect and aims for practical results.

Candidates to this position should hold a Master’s in Computer Science/Informatics or a related field. They must be excited by research in systems, distributed systems, databases, and/or language and verification, and should have an excellent academic record in one of these areas. He or she will be developing and experimenting software at large scale. Teamwork and communication skills, industrial experience, and good knowledge of Erlang and/or node.js is a plus. This is offered as part of Inria’s annual PhD competition. To apply, please use the following link: https://jobs.inria.fr/public/classic/fr/offres/2018-00527, and click on the “Apply” button on the top right. Please provide the following information:

  • A resume or Curriculum Vitæ.
  • A list of courses and grades of the last two years of study (an informal transcript is OK).
  • Names and contact details of at least two references (people who can recommend you), whom we will contact directly.
  • If relevant, a link to your publications and/or open-source developments.

References

[1]
AntidoteDB, a planet-scale, available, transactional database with strong semantics. http://antidoteDB.eu/
[2]
The Reactive Manifesto. https://www.reactivemanifesto.org, September 2014.
[3]
Gérard Boudol, Zhengqin Luo, Tamara Rezk, and Manuel Serrano. Reasoning about Web applications: An operational semantics for HOP. ACM Transactions on Programming Languages and Systems, 34(2):10:1–10:40, June 2012. DOI: 10.1145/ 2220365.2220369.
[4]
Sergey Bykov, Alan Geller, Gabriel Kliot, James R. Larus, Ravi Pandya, and Jorgen Thelin. Orleans: cloud computing for everyone. In Symp. on Cloud Computing, pages 16:1–16:14, Cascais, Portugal, October 2011. Assoc. for Computing Machinery. DOI: 10.1145/2038916.2038932.

Contact: Marc Shapiro.

Last modified: Wed Mar 7 17:48:48 CET 2018