Two postdoc positions available

The team is recruiting two postdoc researchers:

Engineer position available

We are looking for an engineer for analyzing the I/O behavior of HPC applications https://jobs.inria.fr/public/classic/fr/offres/2020-03037

hwloc 2.3.0 published

A new major hwloc release 2.3.0 was published. It brings a new API for describing heterogeneous platforms, support for AMD RSMI library for managing GPUs, as well as many small improvements all over the place.

V. Honoré defends his PhD thesis on October 15th

Valentin Honoré will defend his PhD thesis entitled “HPC – Big Data Convergence: Managing the Diversity of Application Profiles on HPC Facilities” at University of Bordeaux on October 15th.

The jury will be composed of:

  • Gabriel Antoniu, Director of Research – Inria (Examiner)
  • Anne Benoit, Associate Professor – ENS Lyon (Reviewer)
  • Ewa Deelman, Research Director – USC Information Sciences Institute (Examiner)
  • Frédéric Suter, Research Director – IN2P3, Reviewer
  • Brice Goglin, Research Director – Inria (Director)
  • Guillaume Pallez, Researcher – Inria (Co-advisor)

 

B. Goglin gave a Keynote at the SBAC-PAD conference

B. Goglin gave a keynote at the SBAC-PAD international conference. He talked about process placement, modeling hierarchical architectures and heterogeneous resources.

Hardware-based communicator split accepted in MPI 4.0

G. Mercier’s proposal for hardware-topology-based MPI communicator split passed the 2nd vote at the MPI Forum meeting on June 30th, which means it will be in the revision 4.0 of the MPI standard (official release in 2020Q4).
The “Guided” mode is described here while the “Unguided” mode is here. A prototype implementation is already available here.

Talk by Valentin Honoré

On June 24th, Valentin Honoré, PhD student from our team, will present us some results of his research, published at IPDPS’20.

Title: Reservation and Checkpointing Strategies for Stochastic Jobs

Abstract:
In this paper, we are interested in scheduling and checkpointing stochastic jobs on a reservation-based platform, whose cost depends both (i) on the reservation made, and (ii) on the actual execution time of the job. Stochastic jobs are jobs whose execution time cannot be determined easily. They arise from the heterogeneous, dynamic and
data-intensive requirements of new emerging fields such as neuroscience. In this study, we assume that jobs can be interrupted at any time to take a checkpoint, and that job execution times follow a known probability distribution. Based on past experience, the user has to determine a sequence of fixed-length reservation requests, and to decide whether the state of the execution should be checkpointed at the end of each request. The objective is to minimize the expected cost of a successful execution of the jobs. We provide an optimal strategy for discrete probability distributions of job execution times, and we design fully polynomial-time approximation strategies for continuous distributions with bounded support. These strategies are then experimentally evaluated and compared to standard approaches such as periodic-length reservations and simple checkpointing strategies (either checkpoint all reservations, or none). The impact of an imprecise knowledge of checkpoint and restart costs is also assessed experimentally.

Slides will be available at http://people.bordeaux.inria.fr/vhonore/documents/ipdps_presentation.pdf.

Talk by Ana Gainaru on December 16th

Ana Gainaru from Vanderbilt University (TN, USA) will present a talk entitled “HPC for All: Easy deployment for heterogeneous dynamic applications.”

Guillaume Mercier defended his habilitation

Guillaume Mercier defended his habilitation on December 4th.

His habilitation is entitled “Challenges of Message Passing Evolution and Management of Hierarchical
Hardware Topologies”.

Talk by Jesper Larsson Träff on December 3rd

Cartesian Collective Communication “Advice to users”, “Advice to implementers”, and “Advice to Standardizers”

Cartesian Collective Communication (or stencil communication) is a restricted form of general, sparse, graph neighborhood collective communication as known in for instance MPI. The prime charactistic of Cartesian Collective Communication is that processes organized in a d-dimensional torus (or mesh) all communicate with the same, relative set of neighbors. In the talk, we discuss how Cartesian Collective Communication can be incorporated and used in MPI, giving both “advice to users”, “advice to implementers”, and “advice to standardizers”. We also present new, message-combining algorithms for efficiently supporting Cartesian Collective alltoall and allgather Communication (for small problems), and give of experimental results showing that this form of sparse collective communication can be supported with a performance advantage.