We are looking for an engineer for analyzing the I/O behavior of HPC applications https://jobs.inria.fr/public/classic/fr/offres/2020-03037
Sep 22
V. Honoré defends his PhD thesis on October 15th
Valentin Honoré will defend his PhD thesis entitled “HPC – Big Data Convergence: Managing the Diversity of Application Profiles on HPC Facilities” at University of Bordeaux on October 15th.
The jury will be composed of:
- Gabriel Antoniu, Director of Research – Inria (Examiner)
- Anne Benoit, Associate Professor – ENS Lyon (Reviewer)
- Ewa Deelman, Research Director – USC Information Sciences Institute (Examiner)
- Frédéric Suter, Research Director – IN2P3, Reviewer
- Brice Goglin, Research Director – Inria (Director)
- Guillaume Pallez, Researcher – Inria (Co-advisor)
Sep 10
B. Goglin gave a Keynote at the SBAC-PAD conference
B. Goglin gave a keynote at the SBAC-PAD international conference. He talked about process placement, modeling hierarchical architectures and heterogeneous resources.
Jun 30
Hardware-based communicator split accepted in MPI 4.0
Jun 17
Talk by Valentin Honoré
On June 24th, Valentin Honoré, PhD student from our team, will present us some results of his research, published at IPDPS’20.
Title: Reservation and Checkpointing Strategies for Stochastic Jobs
Abstract:
In this paper, we are interested in scheduling and checkpointing stochastic jobs on a reservation-based platform, whose cost depends both (i) on the reservation made, and (ii) on the actual execution time of the job. Stochastic jobs are jobs whose execution time cannot be determined easily. They arise from the heterogeneous, dynamic and
data-intensive requirements of new emerging fields such as neuroscience. In this study, we assume that jobs can be interrupted at any time to take a checkpoint, and that job execution times follow a known probability distribution. Based on past experience, the user has to determine a sequence of fixed-length reservation requests, and to decide whether the state of the execution should be checkpointed at the end of each request. The objective is to minimize the expected cost of a successful execution of the jobs. We provide an optimal strategy for discrete probability distributions of job execution times, and we design fully polynomial-time approximation strategies for continuous distributions with bounded support. These strategies are then experimentally evaluated and compared to standard approaches such as periodic-length reservations and simple checkpointing strategies (either checkpoint all reservations, or none). The impact of an imprecise knowledge of checkpoint and restart costs is also assessed experimentally.
Slides will be available at http://people.bordeaux.inria.fr/vhonore/documents/ipdps_presentation.pdf.
Dec 16
Talk by Ana Gainaru on December 16th
Ana Gainaru from Vanderbilt University (TN, USA) will present a talk entitled “HPC for All: Easy deployment for heterogeneous dynamic applications.”
Dec 04
Guillaume Mercier defended his habilitation
Guillaume Mercier defended his habilitation on December 4th.
His habilitation is entitled “Challenges of Message Passing Evolution and Management of Hierarchical
Hardware Topologies”.
Dec 03
Talk by Jesper Larsson Träff on December 3rd
Cartesian Collective Communication “Advice to users”, “Advice to implementers”, and “Advice to Standardizers”
Cartesian Collective Communication (or stencil communication) is a restricted form of general, sparse, graph neighborhood collective communication as known in for instance MPI. The prime charactistic of Cartesian Collective Communication is that processes organized in a d-dimensional torus (or mesh) all communicate with the same, relative set of neighbors. In the talk, we discuss how Cartesian Collective Communication can be incorporated and used in MPI, giving both “advice to users”, “advice to implementers”, and “advice to standardizers”. We also present new, message-combining algorithms for efficiently supporting Cartesian Collective alltoall and allgather Communication (for small problems), and give of experimental results showing that this form of sparse collective communication can be supported with a performance advantage.
Nov 04
Talk by Laercio LIMA PILLA (LRI)
On November 4th, Laercio LIME PILLA, CNRS researcher at LRI (Paris) will present us his recent works about load-balancing in distributed environments.
The talk will take place in room Grace Hopper 2 at 4pm.
Slides of the talks can be found here:
Scalable Scheduling Distributed Algorithms & the Packing Model
Oct 14