Talk by Robin Boezennec

On October, 11th, Robin Boezennec, PhD Student in TADaaM team, will presents us the work he has done during his interneship.

Title: Stochastic execution time modelling to improve scheduling performance in high performance computing

Current batch schedulers are based on the use of a user-provided execution time estimate. It is known that this estimate is almost always wrong. One direction of research is therefore to try to accurately predict the execution time of tasks, in particular via the analysis of their code. However, this approach has never succeeded. We assume that exact prediction of execution times is too complicated to be profitable, if not impossible. We will therefore consider execution times as random variables, use statistics to qualify them, and incorporate this knowledge into the batch scheduler. This presentation will present the first results obtained during a pre-thesis internship.

Talk by Luan Teylo

On September, 2nd, Luan Teylo, post-doc in our team, will present his recent work, described in a paper accepted at Cluster 2022.

Title: The role of storage target allocation in applications’ I/O performance with BeeGFS

Parallel file systems are at the core of HPC I/O infrastructures. Those systems minimize the I/O time of applications by separating files into fixed-size chunks and distributing them across multiple storage targets. Therefore, the I/O performance experienced with a PFS is directly linked to the capacity of retrieving these chunks in parallel. In this work, we conduct an in-depth evaluation of the impact of the stripe count (the number of targets used for striping) on the write performance of BeeGFS, one of the most popular parallel file systems today. We consider different network configurations and show the fundamental role played by this parameter, in addition to the number of compute nodes, processes and storage targets. Through a rigorous experimental evaluation, we directly contradict conclusions from related work. Notably, we show that sharing I/O targets does not lead to performance degradation and that applications should use as many storage targets as possible. Our recommendations have the potential to significantly improve the overall write performance of BeeGFS deployments, and also provide valuable information for future work on storage target allocation and stripe count tuning.

Talk by Jannis Klinkenberg

Jannis Klinkenberg, PhD student from RWTH Aachen, is visiting our team this summer. He will present his work next Tuesday, July 19th.

Title: Locality-Aware Scheduling in OpenMP

Today’s HPC systems typically consist of modern shared-memory NUMA machines each comprising two or more multi-core processor packages with local memory. On such systems, affinity of data to computation is crucial for achieving high performance. OpenMP 3.0 introduced support for task-parallel programs in 2008 and has continued to extend its applicability and expressiveness. However, the ability to support data affinity of tasks was missing. In this talk, I will present several approaches for task-to-data affinity that combine locality-aware task distribution and task stealing.
Further, the heterogeneity of HPC systems in the Top500 tends to increase such that shared-memory machines additionally feature 2 or more GPUs. Programs usually start on host and offload computational-intensive parts to GPUs to speed up the overall execution time. Consequently, locality between the offloading threads, data used by the computation, and the GPUs can have a significant impact on performance.

The slides are available here.

hwloc 2.8.0 published

A new stable hwloc release 2.8.0 was published. It brings several improvements all over the place.

Florian Reynier defends his PhD

F. Reynier defends his PhD entitled “Étude sur la progression des communications MPI à base de ressources dédiées” on June 24th at University of Bordeaux.

Postdoc position on Heterogeneous Memory

A 2-year postdoc position is available in the TADaaM team for working on heterogeneous memory in the H2M project.

Talk by Lucia Drummond

On June, 20th, Lucia Drummond, from Fluminense Federal University in Brazil, will be visiting us and will take this opportunity to present her work.

Title: Optimizing Computational Costs of High-Performance Applications on Clouds

Many applications of world strategic importance, such as those employed in the Oil and Gas industry, meteorology, and the areas of biodiversity and health, depend on High-Performance Computing (HPC) to provide accurate results in short time frames. Computational clouds have emerged as a low-cost alternative to HPC, offering a set of virtualized resources that can be quickly provisioned and dynamically allocated. However, there are still several barriers to its use, such as efficient and scalable use of resources, selection of virtual machines (VMs), and scheduling of tasks on these VMs, which have a direct impact on performance and financial costs. Furthermore, cloud VMs are prone to revocations in the cheaper markets, and to satisfy service level agreements, fault tolerance of cloud VMs is a major concern.
In this talk, we will introduce some resource management problems in clouds and we will present some strategies to solve them, aiming at the efficient use of these platforms by HPC applications.

Talk by Clément Gavoille

On June, 28th, Clément Gavoille, PhD student from our team and the CEA, will present his recent work, described in a paper accepted at Euro-Par 2022.

Title: Relative performance projection on Arm architectures

With the advent of multi- many-core processors and hardware accelerators, choosing a specific architecture to renew a supercomputer can become very tedious. This decision process should consider the current and future parallel application needs and the design of the target software stack. It should also consider the single-core behavior of the application as it is one of the performance limitations in today’s machines.
In such a scheme, performance hints on the impact of some hardware and software stack modifications are mandatory to drive this choice. This paper proposes a workflow for performance projection based on execution on an actual processor and the application’s behavior. This projection evaluates the performance variation from an existing core of a processor to a hypothetical one to drive the design choice. For this purpose, we characterize the maximum sustainable performance of the target machine and analyze the application using the software stack of the target machine. To validate this approach, we apply it to three applications of the CORAL benchmark suite: LULESH, MiniFE, and Quicksilver, using a single-core of two Arm-based architectures: Marvell ThunderX2 and Arm Neoverse N1. Finally, we follow this validation work with an example of design-space exploration around the SVE vector size, the choice of DDR4 and HBM2, and the software stack choice on A64FX on our applications with a pool of three source architectures: Arm Neoverse N1, Marvell ThunderX2, and Fujitsu A64FX.

Talk by Fanny Dufossé

On April 5th, Fanny Dufossé from INRIA team DataMove will present her work.

TitleDimensioning of multi-clouds with follow-the-renewable approaches for environmental impact minimization

Cloud computing has become an essential component of our digital society. Efforts for reducing its environmental impact are being made by academics and industry alike, with commitments from major cloud providers to be fully operated by renewable energy in the future. One strategy to reduce nonrenewable energy usage is the “follow-the-renewables”, in which the workload is migrated to be executed in the data centers with the most availability of renewable energy.
The objective consists in developing a realistic model of Clouds supplied by green energy, and a simulation platform to compare scheduling algorithms. We consider the problem of cloud dimensioning, to minimize the ecological impact of data centers in terms of brown energy consumption and IT products manufacturing.

The slides are available here.

Talk by Clément Foyer

On March 8, Clément Foyer, Postdoc  in the team, will present his last paper, accepted at MCHPC21.

Title: Using Bandwidth Throttling to Quantify Application Sensitivity to Heterogeneous Memory

In the dawn of the exascale era, the memory management is getting increasingly harder but also of primary importance. The plurality of processing systems along with the emergence of heterogeneous memory systems require more care to be put into data placement. Yet, in order to test models, designs and heuristics for data placement, the programmer has to be able to access these expensive systems, or find a way to emulate them.
In this paper we propose to use the Resource Control features of the Linux kernel and x86 processors to add heterogeneity to a homogeneous memory system in order to evaluate the impact of different bandwidths on application performance. We define a new metric to evaluate the sensibility to bandwidth throttling as a way to investigate the benefits of using high-bandwidth memory (HBM) for any given application, without the need to access a platform offering this kind of memory. We evaluated 6 different well-known benchmarks with different sensitivity to bandwidth on a AMD platform, and validated our results on two Intel platforms with heterogeneous memory, Xeon Phi and Xeon with NVDIMMs. Although representing an idealized version of HBM, our method gives reliable insight of potential gains when using HBM.
Finally, we envision a design based on Resource Control using both bandwidth restriction and cache partitioning to simulate a more complex heterogeneous environment that allows for hand-picked data placement on emulated heterogeneous memory. We believe our approach can help develop new tools to test reliably new algorithms that improve data placement for heterogeneous memory systems.

The  paper is available here.