Talk by Yves Robert on March 12th, 2018

Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms

joint work with Dorian Arnold, George Bosilca, Aurelien Bouteiller, Jack Dongarra, Kurt Ferreira and Thomas Hérault

Abstract:
In high-performance computing environments, input/output (I/O) from various sources often contend for scarce available bandwidth. Adding to the I/O operations inherent to the failure-free execution of an application, I/O from checkpoint/restart (CR) operations (used to ensure progress in the presence
of failures) place an additional burden as it increase I/O contention, leading to degraded performance. In this work, we consider a cooperative scheduling policy that optimizes the overall performance of concurrently executing CR-based applications which share valuable I/O resources. First, we provide a theoretical model and then derive a set of necessary constraints needed to minimize the global waste on the platform.

Our results demonstrate that the optimal checkpoint interval, as defined by Young/Daly, despite providing a sensible metric for a single application, is not sufficient to optimally address resource contention at the platform scale. We therefore show that combining optimal checkpointing periods with I/O scheduling
strategies can provide a significant improvement on the overall application performance, thereby maximizing platform throughput. Overall, these results provide critical analysis and direct guidance on checkpointing large-scale workloads in the presence of competing I/O while minimizing the impact
on application performance.

Talk by Jalil Boukhobza on Feb 27, 2018

Titre: Vers une approche orthogonale pour l’optimisation des (nouveaux) systèmes de stockage

Résumé: Aujourd’hui, en une minute, plus de 3 millions de posts sont écrits sur Facebook , plus de 40 000 photos sont déposées sur Instagram, et plus de 120 heures de vidéos sont chargées sur YouTube. Ce ne sont ici que des exemples du déluge de données numérique qui sont stockées dans différents centres de données. Le traitement de ces données devient un enjeu économique et sociétal majeur. Une condition préalable pour un traitement performant de cette information est de disposer d’un système de stockage efficient.

Durant cette présentation, nous essayerons de comprendre pourquoi la mémoire flash a envahi le marché en décrivant quelques-unes de nos contributions. Ces contributions ont été conçues à trois niveaux complémentaires: architecturale, système et applicative. Nous introduirons aussi les caractéristiques de quelques nouvelles technologies de mémoires non-volatiles qui viendraient bousculer notre conception de la mémoire et du stockage dans un futur proche.

Talk by Francieli Zanon Boito on Feb 15, 2018

Francieli Zanon Boito (postdoc dans l’équipe Inria Corse à Grenoble) vient nous parler de ses travaux de recherche.

Title: I/O scheduling for HPC: finding the right access pattern and mitigating interference

Abstract: Scientific applications are executed in a high performance computing (HPC) environment, where a parallel file system (PFS) provides access to a shared storage infrastructure. The key characteristic of these systems is the use of multiple storage servers, from where data can be obtained by the clients in parallel. The performance observed by applications when accessing a PFS is directly affected by the way they perform this access, i.e. their access pattern. Additionally, when multiple applications concurrently access the PFS, their performance will suffer from interference.

In this seminar, I’ll discuss my previous and current work with I/O scheduling at different levels of the I/O stack, adapting policies to applications’ access patterns and working to mitigate interference

Open PhD position

An PhD position is available in the team about Data Placement Strategies for Heterogeneous and Non-Volatile Memories in High Performance Computing

Get more details and post your CV at
https://jobs.inria.fr/public/classic/en/offres/2018-00386

hwloc 2.0.0 and new memory technologies

TADaaM is releasing the new major hwloc 2.0 version which updates the way we model new memory technologies (HBM, NVDIMM, etc). This is the result of two years of work and several research papers about this new modeling and about improving support for manycore architectures at scale.

The announce of hwloc 2.0.0

Talk by Bruno Raffin on Jan 30, 2018

Title: High Performance Data Analysis for Parallel Numerical Simulations.

Author: Bruno Raffin, Director of Research, DataMove Team, Inria Grenoble

Abstract:
Large scale numerical simulations are producing an ever growing amount of data that include the simulation results as well as execution traces and logs. These data represent a double challenge.
First, these amounts of data are becoming increasingly difficult to analyse relying on traditional tools. Next, moving these data from the simulation to disks, to latter retrieve them from disks to the analysis machine is becoming increasingly costly in term of time and energy.
And this situation is expected to worsen as supercomputer I/Os and more generally data movements capabilities are progressing more slowly than compute capabilities. While the simulation was at the
center of all attentions, it is now time to focus on high performance data analysis.
This integration of data analytics with large scale simulations represents a new kind of workflow that needs adapted software solutions.

Valentin Honoré joins TADaaM as a PhD student

Valentin will work in Partitioning strategies for High-Throughput Applications. In particular his focus will be on Hierarchical memories, used in the context of in-situ/in-transit frameworks.

His thesis is supervised by Guillaume Aupy and Brice Goglin.

Welcome Valentin :).

Guillaume Aupy obtained an ANR JCJC — DASH

Guillaume Aupy was granted an ANR JCJC from the AAPG 2017 call. the ANR is scheduled to start on March 1st, more information can be found on the dedicated website.

The goal of the project is to study I/O congestion in supercomputers and to provide new static and dynamic algorithms to minimize it.

Talk by Amelie Zhou on Jun 15, 2017

Amelie Zhou, postdoc in the Ascola research team will talk about On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters.

Graph partitioning, which distributes graph processing workloads to multiple machines for better parallelism, is an important problem for optimizing the performance and communication cost of graph processing jobs. Recently, many graph applications such as social networks store their data on geo-distributed datacenters (DCs) to ensure flexible and low-latency services. This raises new challenges to existing graph partitioning methods, due to the costly Wide Area Network (WAN) bandwidths and the heterogeneous network bandwidths in the geo-distributed DCs. In this paper, we propose a heterogeneity-aware graph partitioning method named G-Cut, which aims at minimizing the runtime of graph processing jobs in geo-distributed DCs while satisfying the WAN usage budget. G-Cut is a two-stage graph partitioning method. In the traffic-aware graph partitioning stage, we adopt the one-pass edge
assignment to place edges into different partitions while minimizing the inter-DC data traffic size. In the network-aware partition refinement stage, we map the partitions obtained in the first stage onto different
DCs in order to minimize the inter-DC data transfer time. We evaluate the effectiveness and efficiency of G-Cut using real-world graphs. The evaluation results show that G-Cut is able to obtain both lower data
transfer time and WAN usage compared to the state-of-the-art graph partitioning methods.

Talk by Georges Da Costa on Apr 07, 2017

Georges Da Costa will present works around “Multi-objective resources optimization: Performance- and Energy-aware  HPC and Clouds