G. Pallez winner of the IEEE-CS TCHPC Early Career Researchers Award

Guillaume Pallez (Aupy) wins the 2019 IEEE-CS TCHPC Early Career Researchers Award for Excellence in High Performance Computing. The award will be presented at the SC19 conference that will be held in Denver next month.

hwloc 2.1.0 published

A new major hwloc release 2.1.0 was published. It brings many improvements for new processor and memory architectures.

Philippe Swartvagher joins the team as a PhD student

Philippe Swartvagher will work on interactions between high performance communication libraries and task-based runtime systems.

NewMadeleine version 2019-09-06 released

A new stable release of NewMadeleine was published. It contains mostly bug fixes, a new one-sided interface, non-blocking collectives, and optimizations of blocking collectives. NewMadeleine is included in the pm2 package.

Francieli Zanon Boito joins TADaaM as an assistant professor

Francieli Zanon Boito is now assistant professor at University of Bordeaux. Her research interests include storage and parallel I/O for high performance computing and big data.


PADAL Workshop in Bordeaux from Sept 9th to 11th

We are organizing the Fifth Workshop on Programming Abstractions for Data Locality (PADAL’19) at Inria Bordeaux from September 9th to 11th.

More information on the workshop webpage

Talk by P. Swartvagher on collective communication on July 9th

Philippe Swartvagher will present his internship results about dynamic collective communications for StarPU/NewMadeleine.

Talk by Brice Goglin on Cache Partitioning on June 11th

Brice will present his article published at Cluster 2018 with ENS Lyon, UTK and Georgiatech.

Title: Co-scheduling HPC workloads on cache-partitioned CMP platforms

Co-scheduling techniques are used to improve the throughput of applications  on chip multiprocessors (CMP), but sharing resources often generates critical interferences. We focus on the interferences in the
last level of cache (LLC) and use the Cache Allocation Technology (CAT) recently provided by Intel to partition the LLC and give each co-scheduled application their own cache area.

We consider m iterative HPC applications running concurrently and answer the following questions: (i) how to precisely model the behavior of these applications on the cache partitioned platform? and (ii) how many cores and cache fractions should be assigned to each application to maximize the platform efficiency? Here, platform efficiency is defined as maximizing the performance either globally, or as guaranteeing a fixed ratio of iterations per second for each application.

Through extensive experiments using CAT, we demonstrate the impact of cache partitioning when multiple HPC application are co-scheduled onto CMP platforms.

Talk by Emmanuel Jeannot on May 29th

Emmanuel Jeannot will present us an empirical study he achieved in the domain of process affinity.

Title : Process Affinity, Metrics and Impact on Performance: an Empirical Study

Process placement, also called topology mapping, is a well-known strategy to improve parallel program execution by reducing the communication cost between processes. It requires two inputs: the topology of the target machine and a measure of the affinity between processes. In the literature, the dominant affinity measure is the communication matrix that describes the amount of communication between processes. The goal of this paper is to study the accuracy of the communication matrix as a measure of affinity. We have done an extensive set of tests with two fat-tree machines and a 3d-torus machine to evaluate several hypotheses that are often made in the literature and to discuss their validity. First, we check the correlation between algorithmic metrics and the performance of the application. Then, we check whether a good generic process placement algorithm never degrades performance. And finally, we see whether the structure of the communication matrix can be used to predict gain.

Scotch 6.0.7 released!

Scotch, the software package for graph/mesh/hypergraph partitioning, graph clustering, and sparse matrix ordering, has a new 6.0.7 release. It extends the target architecture API and adds MeTiS v5 compatibility.