Stage (M2): Identifying the Benefit of Communication Aggregation in HPC Applications

Communication aggregation is an important optimization in applications that communicate at fine granularity, for example, in meta genomics, that employ data structures such as hash tables. In such applications, one- sided communication that supports direct remote memory access (RMA) to another address space–has an advantage over classic two-sided communication (message passing), because it decouples data movement from synchronization. The effect is to lower communication costs, which are limited by the overhead of making small data transfers. Many libraries, including MPI [1], and ”PGAS” libraries such as OpenShmem and UPC++ [2], support RMA. RMA is also supported in languages such as co-Array FORTRAN, co-Array C++, Chapel and UPC, and was also supported in X10. Today, library solutions are a growing trend in support for RMA.

Compiler optimizations for PGAS languages have been explored [3], but library-based solutions require manual optimization. The goal of the internship is to demonstrate the benefit of aggregation in proxies that represent the patterns of communication and computation in technologically important applications, written using a PGAS library, such as UPC++, or using MPI RMA. We will focus on a proxy written in UPC++ that can be found in the UPC++ ”UPC++-extras” repository of the UPC++ project, located at https://bitbucket.org/berkeleylab/upcxx/wiki/Home (see the example code called ”GUPS”). An MPI one-sided version of this code will be availble soon.

The project will involve hands on coding of the proxies, and experimentation on scalable distributed memory computers. More specifically, optimizations will first be performed manually in the source code of GUPS. We will use the Plafrim machine for the experiments.

Depending on the progress achieved, the optimizations will then be performed automatically in LLVM. MPI/UPC++ programs will be analyzed using its static single assignment form and the associated dataflow graph.

 

Prerequisite: C/C++ programming and MPI or UPC knowledge

Contacts :
Emmanuelle Saillard (emmanuelle.saillard@inria.fr)
Denis Barthou (denis.barthou@inria.fr)
Scott Baden (baden@eng.ucsd.edu)

Where : Inria Bordeaux Sud-Ouest, équipe STORM

Références:

[1] The Message Passing Interface (MPI) standard. https://www.mpi-forum.org/

[2] J. Bachan, S. B. Baden, S. Hofmeyr, M. Jacquelin, A. Kamil, D. Bonachea, P. H. Hargrove, and H. Ahmed, ”Upc++: A high- performance communication framework for asynchronous computation,” in 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS?19), (Rio de Janeiro, Brazil), May 2019.

[3] W.-Y. Chen, C. Iancu, and K. Yelick, ”Communication optimizations for fine-grained UPC applications,” in Parallel Architectures and Compilation Techniques – Conference Proceedings, PACT, vol. 2005, pp. 267-278, 10 2005.

Comments are closed.