Talk by Francieli Zanon Boito on Feb 15, 2018

Francieli Zanon Boito (postdoc dans l’équipe Inria Corse à Grenoble) vient nous parler de ses travaux de recherche.

Title: I/O scheduling for HPC: finding the right access pattern and mitigating interference

Abstract: Scientific applications are executed in a high performance computing (HPC) environment, where a parallel file system (PFS) provides access to a shared storage infrastructure. The key characteristic of these systems is the use of multiple storage servers, from where data can be obtained by the clients in parallel. The performance observed by applications when accessing a PFS is directly affected by the way they perform this access, i.e. their access pattern. Additionally, when multiple applications concurrently access the PFS, their performance will suffer from interference.

In this seminar, I’ll discuss my previous and current work with I/O scheduling at different levels of the I/O stack, adapting policies to applications’ access patterns and working to mitigate interference

Open PhD position

An PhD position is available in the team about Data Placement Strategies for Heterogeneous and Non-Volatile Memories in High Performance Computing

Get more details and post your CV at
https://jobs.inria.fr/public/classic/en/offres/2018-00386

hwloc 2.0.0 and new memory technologies

TADaaM is releasing the new major hwloc 2.0 version which updates the way we model new memory technologies (HBM, NVDIMM, etc). This is the result of two years of work and several research papers about this new modeling and about improving support for manycore architectures at scale.

The announce of hwloc 2.0.0

Talk by Bruno Raffin on Jan 30, 2018

Title: High Performance Data Analysis for Parallel Numerical Simulations.

Author: Bruno Raffin, Director of Research, DataMove Team, Inria Grenoble

Abstract:
Large scale numerical simulations are producing an ever growing amount of data that include the simulation results as well as execution traces and logs. These data represent a double challenge.
First, these amounts of data are becoming increasingly difficult to analyse relying on traditional tools. Next, moving these data from the simulation to disks, to latter retrieve them from disks to the analysis machine is becoming increasingly costly in term of time and energy.
And this situation is expected to worsen as supercomputer I/Os and more generally data movements capabilities are progressing more slowly than compute capabilities. While the simulation was at the
center of all attentions, it is now time to focus on high performance data analysis.
This integration of data analytics with large scale simulations represents a new kind of workflow that needs adapted software solutions.

Valentin Honoré joins TADaaM as a PhD student

Valentin will work in Partitioning strategies for High-Throughput Applications. In particular his focus will be on Hierarchical memories, used in the context of in-situ/in-transit frameworks.

His thesis is supervised by Guillaume Aupy and Brice Goglin.

Welcome Valentin :).

Guillaume Aupy obtained an ANR JCJC — DASH

Guillaume Aupy was granted an ANR JCJC from the AAPG 2017 call. the ANR is scheduled to start on March 1st, more information can be found on the dedicated website.

The goal of the project is to study I/O congestion in supercomputers and to provide new static and dynamic algorithms to minimize it.

Talk by Amelie Zhou on Jun 15, 2017

Amelie Zhou, postdoc in the Ascola research team will talk about On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters.

Graph partitioning, which distributes graph processing workloads to multiple machines for better parallelism, is an important problem for optimizing the performance and communication cost of graph processing jobs. Recently, many graph applications such as social networks store their data on geo-distributed datacenters (DCs) to ensure flexible and low-latency services. This raises new challenges to existing graph partitioning methods, due to the costly Wide Area Network (WAN) bandwidths and the heterogeneous network bandwidths in the geo-distributed DCs. In this paper, we propose a heterogeneity-aware graph partitioning method named G-Cut, which aims at minimizing the runtime of graph processing jobs in geo-distributed DCs while satisfying the WAN usage budget. G-Cut is a two-stage graph partitioning method. In the traffic-aware graph partitioning stage, we adopt the one-pass edge
assignment to place edges into different partitions while minimizing the inter-DC data traffic size. In the network-aware partition refinement stage, we map the partitions obtained in the first stage onto different
DCs in order to minimize the inter-DC data transfer time. We evaluate the effectiveness and efficiency of G-Cut using real-world graphs. The evaluation results show that G-Cut is able to obtain both lower data
transfer time and WAN usage compared to the state-of-the-art graph partitioning methods.

Talk by Georges Da Costa on Apr 07, 2017

Georges Da Costa will present works around “Multi-objective resources optimization: Performance- and Energy-aware  HPC and Clouds

Talk by Valentin Lefèvre on Feb 28, 2017

Valentin Le Fèvre from LIP will present his work.
Title: Periodic I/O scheduling for super-computers
Abstract: With the ever-growing need of data in HPC applications, the congestion at the I/O level becomes critical in super-computers. Architectural enhancement such as burst-buffers and pre-fetching are added to machines, but are not sufficient to prevent congestion. Recent online I/O
scheduling strategies have been put in place, but they add an additional congestion point and overheads in the computation of applications. In this work, we show how to take advantage of the periodic nature of HPC applications in order to develop efficient periodic scheduling strategies
for their I/O transfers. Our strategy computes once during the job scheduling phase a pattern where it defines the I/O behavior for each application, after which the applications run independently, transferring their I/O at the specified times. Our strategy limits the amount of I/O congestion at
the I/O node level and can be easily integrated into current job schedulers. We validate this model through extensive simulations and experiments by comparing it to state-of-the-art online solutions, showing that not only our scheduler has the advantage of being de-centralized and thus
overcoming the overhead of online schedulers, but also that it performs better than these solutions, improving the application dilation up to 13\% and the maximum system efficiency up to 18%.

Talk by Jean-Thomas Acquaviva on Jan 09, 2017

Jean-Thomas Acquaviva from DataDirect Networks will give a talk about storage in HPC.
Titre: IME and Managing the transition to Storage System Hierarchy

Abstract:

The current storage software stack is the center of a perfect storm. Only five years ago systems were assembled with clearly identified building blocks, each with its own order of magnitude: CPU –nanosecond, network – microsecond, storage – millisecond. But storage devices are filtering down to faster levels, from hard drive to storage class memory, the factor of acceleration is a multiple order of magnitude.
While the hardware landscape has been radically redefined, simultaneously a new ecosystem is flourishing, big data applications exhibit different workloads which are at this time still not well understood by the community.
Thus the storage stack has to deal simultaneously with new and more complex hardware with a deeper hierarchy and new application requirements.
Before rushing to the keyboard, it seems the right time to get more and better quality insights about what’s exactly going on!
The DIO-pro effort is an on-going development at DDN Storage which is trying to build new a set of I/O performance analysis tools. Once an I/O trace has been captured on the client side, we propose to compress it with some structural compression based methods and *replay* it against new hardware.
This is an original way to extrapolate the performance of a given application on different hardware platforms. Doing so opens the path to a better understanding of what is actually needed for future storage platforms. While still being at a prototyping stage, initial results are promising and we hope to trigger the interest of the community on such method.