2021 and 2022 Activity

2021 and 2022 Scientific Progress

Hiding the latency of MPI communications

The Message Passing Interface (MPI) defines multiple functions to perform communications
over distributed architectures. Among these operations, the nonblocking ones allow
communications to asynchronously progress, thus enabling the overlap of communications by
computations. Yet this version of the communications are harder to use because of their
composition and offer less security mechanisms. Developers are more prone to commit
programming errors which can lead to deadlocks or data corruption. Consequently,
nonblocking communications, and more specifically the collective forms, are still not widely
used to create overlapping opportunities. The goal of Van Man Nguyen’s PhD was the
development of methods to help developers in using these communications. First, we
proposed a method to match nonblocking calls at compile-time and to detect programming
errors involving those using information on the control flow and the data flow. Secondly,
we proposed a method to automatically transform existing blocking calls into their
nonblocking versions. This method then reorganizes the code of a function by moving the
dependencies of communications in order to maximize the length of the overlapping intervals.
It is also applied on existing nonblocking calls using the matching information found by the
verification method. Finally, we built upon the limitations of the automatic approach to
propose a method to improve the overlapping potential of MPI programs by identifying the
boundaries of overlapping intervals and suggesting code modification to developers.

Code transformations for improving performance and productivity of PGAS applications

The PGAS model is an attractive means of treating irregular fine-grained communication on
distributed memory systems, providing a global memory abstraction that supports low-overhead
Remote Memory Access (RMA), direct access to memory located in remote address spaces. RMA
performance benefits from hardware support generally provided by modern high performance
communication networks, delivering low overhead communication needed in irregular applications
such as Metagenomics. The proposed research program of Scott Baden International chair applies
source-to-source transformation to PGAS code. The project targets the UPC++ library, a US
Department of Energy Exascale Computing Project that S. Baden lead for three years at the
Lawrence Berkeley National Laboratory (LBNL). More specifically, we have developed a new
primitive for SPMD programming called CARP. CARP can be implemented on top of RMA alone and
it amortizes the startup overhead of RPC. We evaluated CARP using MPI and UPC++ backends on
up to 2K cores, and show that it can realize many of the benefits of native RPC support.

Correctness of MPI 3.0 one-sided communication

One-sided communications is a well known distributed programming paradigm for high
performance computers, as its properties allows for a greater asynchronism and
computation/communication overlap than classical message passing mechanisms. The
Remote Memory Access interface of MPI (MPI-RMA) is an interface in which each process
explicitly exposes an area of its local memory as accessible to other processes to provide
asynchronous one-sided reads, writes and updates. While MPI-RMA is expected to greatly
enhance the performance and permit efficient implementations on multiple platforms, it also
comes with several challenges with respect to memory consistency. Developers must handle
complex memory consistency models and complex programming semantics. During her PhD,
Célia Tassadit Ait Kaci has developed a method that detects memory consistency errors (also
known as data races) during MPI-RMA program executions. It collects relevant MPI-RMA
operations and load/store accesses during execution and performs an on-the-fly analysis to
stop the program in case of a consistency violation. To help programmers detect memory
errors such as race conditions as early as possible, we also proposed a static analysis of MPI-
RMA codes that shows to the programmer the errors that can be detected at compile time.
The detection is based on a local concurrency errors detection algorithm that tracks accesses through BFS searches on the Control Flow Graphs of a program and is complementary to the
runtime analysis developed before. To validate this analysis, we wrote about 30 small codes
that have been integrated in the MPI Bugs Initiative opensource test suite, developed at Inria.

Comments are closed.