Talk by Jannis Klinkenberg

Jannis Klinkenberg, PhD student from RWTH Aachen, is visiting our team this summer. He will present his work next Tuesday, July 19th.

Title: Locality-Aware Scheduling in OpenMP

Abstract:
Today’s HPC systems typically consist of modern shared-memory NUMA machines each comprising two or more multi-core processor packages with local memory. On such systems, affinity of data to computation is crucial for achieving high performance. OpenMP 3.0 introduced support for task-parallel programs in 2008 and has continued to extend its applicability and expressiveness. However, the ability to support data affinity of tasks was missing. In this talk, I will present several approaches for task-to-data affinity that combine locality-aware task distribution and task stealing.
Further, the heterogeneity of HPC systems in the Top500 tends to increase such that shared-memory machines additionally feature 2 or more GPUs. Programs usually start on host and offload computational-intensive parts to GPUs to speed up the overall execution time. Consequently, locality between the offloading threads, data used by the computation, and the GPUs can have a significant impact on performance.

The slides are available here.

Florian Reynier defends his PhD

F. Reynier defends his PhD entitled “Étude sur la progression des communications MPI à base de ressources dédiées” on June 24th at University of Bordeaux.

Postdoc position on Heterogeneous Memory

A 2-year postdoc position is available in the TADaaM team for working on heterogeneous memory in the H2M project.

Talk by Lucia Drummond

On June, 20th, Lucia Drummond, from Fluminense Federal University in Brazil, will be visiting us and will take this opportunity to present her work.

Title: Optimizing Computational Costs of High-Performance Applications on Clouds

Abstract:
Many applications of world strategic importance, such as those employed in the Oil and Gas industry, meteorology, and the areas of biodiversity and health, depend on High-Performance Computing (HPC) to provide accurate results in short time frames. Computational clouds have emerged as a low-cost alternative to HPC, offering a set of virtualized resources that can be quickly provisioned and dynamically allocated. However, there are still several barriers to its use, such as efficient and scalable use of resources, selection of virtual machines (VMs), and scheduling of tasks on these VMs, which have a direct impact on performance and financial costs. Furthermore, cloud VMs are prone to revocations in the cheaper markets, and to satisfy service level agreements, fault tolerance of cloud VMs is a major concern.
In this talk, we will introduce some resource management problems in clouds and we will present some strategies to solve them, aiming at the efficient use of these platforms by HPC applications.

Talk by Clément Gavoille

On June, 28th, Clément Gavoille, PhD student from our team and the CEA, will present his recent work, described in a paper accepted at Euro-Par 2022.

Title: Relative performance projection on Arm architectures

Abstract:
With the advent of multi- many-core processors and hardware accelerators, choosing a specific architecture to renew a supercomputer can become very tedious. This decision process should consider the current and future parallel application needs and the design of the target software stack. It should also consider the single-core behavior of the application as it is one of the performance limitations in today’s machines.
In such a scheme, performance hints on the impact of some hardware and software stack modifications are mandatory to drive this choice. This paper proposes a workflow for performance projection based on execution on an actual processor and the application’s behavior. This projection evaluates the performance variation from an existing core of a processor to a hypothetical one to drive the design choice. For this purpose, we characterize the maximum sustainable performance of the target machine and analyze the application using the software stack of the target machine. To validate this approach, we apply it to three applications of the CORAL benchmark suite: LULESH, MiniFE, and Quicksilver, using a single-core of two Arm-based architectures: Marvell ThunderX2 and Arm Neoverse N1. Finally, we follow this validation work with an example of design-space exploration around the SVE vector size, the choice of DDR4 and HBM2, and the software stack choice on A64FX on our applications with a pool of three source architectures: Arm Neoverse N1, Marvell ThunderX2, and Fujitsu A64FX.

Talk by Fanny Dufossé

On April 5th, Fanny Dufossé from INRIA team DataMove will present her work.

TitleDimensioning of multi-clouds with follow-the-renewable approaches for environmental impact minimization

Abstract:
Cloud computing has become an essential component of our digital society. Efforts for reducing its environmental impact are being made by academics and industry alike, with commitments from major cloud providers to be fully operated by renewable energy in the future. One strategy to reduce nonrenewable energy usage is the “follow-the-renewables”, in which the workload is migrated to be executed in the data centers with the most availability of renewable energy.
The objective consists in developing a realistic model of Clouds supplied by green energy, and a simulation platform to compare scheduling algorithms. We consider the problem of cloud dimensioning, to minimize the ecological impact of data centers in terms of brown energy consumption and IT products manufacturing.

The slides are available here.

Talk by Clément Foyer

On March 8, Clément Foyer, Postdoc  in the team, will present his last paper, accepted at MCHPC21.

Title: Using Bandwidth Throttling to Quantify Application Sensitivity to Heterogeneous Memory

Abstract:
In the dawn of the exascale era, the memory management is getting increasingly harder but also of primary importance. The plurality of processing systems along with the emergence of heterogeneous memory systems require more care to be put into data placement. Yet, in order to test models, designs and heuristics for data placement, the programmer has to be able to access these expensive systems, or find a way to emulate them.
In this paper we propose to use the Resource Control features of the Linux kernel and x86 processors to add heterogeneity to a homogeneous memory system in order to evaluate the impact of different bandwidths on application performance. We define a new metric to evaluate the sensibility to bandwidth throttling as a way to investigate the benefits of using high-bandwidth memory (HBM) for any given application, without the need to access a platform offering this kind of memory. We evaluated 6 different well-known benchmarks with different sensitivity to bandwidth on a AMD platform, and validated our results on two Intel platforms with heterogeneous memory, Xeon Phi and Xeon with NVDIMMs. Although representing an idealized version of HBM, our method gives reliable insight of potential gains when using HBM.
Finally, we envision a design based on Resource Control using both bandwidth restriction and cache partitioning to simulate a more complex heterogeneous environment that allows for hand-picked data placement on emulated heterogeneous memory. We believe our approach can help develop new tools to test reliably new algorithms that improve data placement for heterogeneous memory systems.

The  paper is available here.

N. Vidal defends his PhD.

Nicolas Vidal defends his PhD entitled “Data-aware Scheduling at higher scale” at Inria Bordeaux on Monday January 31st.

Scotch 7.0 published

We announce the release, as free/libre software, of version 7.0 (codename “Sankara”) of the Scotch + PT-Scotch software package. This is a major release, fruition of six years of development, which brings many innovative features.

Continue reading

Talk by Philippe Swartvagher

On November 23, Philippe Swartvagher, PhD student in the team, will present his last paper, accepted at ICPP’21.

Title: Interferences between Communications and Computations in Distributed HPC Systems

Abstract:
Parallel runtime systems such as MPI or task-based libraries provide models to manage both computation and communication by allocating cores, scheduling threads, executing communication algorithms. Efficiently implementing such models is challenging due to their interplay within the runtime system. In this paper, we assess interferences between communications and computations when they run side by side. We study the impact of communications on computations, and conversely the impact of computations on communication performance. We consider two aspects: CPU frequency, and memory contention. We have designed benchmarks to measure these phenomena. We show that CPU frequency variations caused by computation have a small impact on communication latency and bandwidth. However, we have observed on Intel, AMD and ARM processors, that memory contention may cause a severe slowdown of computation and communication when they occur at the same time. We have designed a benchmark with a tunable arithmetic intensity that shows how interferences between communication and computation actually depend on memory pressure of the application. Finally we have observed up to 90 % performance loss on communications with common HPC kernels such as CG and GEMM.

The paper is available here and the slides here.