Research Program

Context

Exascale computing is the next major frontier in the parallel computing research area. In the near future, the community expects the appearance of large-scale machines with billions of processing cores and complex hierarchical structures. The size of the parallel applications tend to follow such changes, being composed of many billions of threads. Several issues arise at such scale. In particular, the issue of how to correctly schedule such number of threads while taking into account the complex hierarchical organization and characteristics of exascale platforms. Another important issue related to scheduling is energy: how do scheduling decisions impact energy consumption ? or can the application can managed to fit in energy constraints imposed by scheduling decisions ? Another related issue is how to analyze scheduling decisions and their impact on application behavior at such scale. A possibility could be to rely on platform and application simulation in order to better understand behavior before going to a real exascale platform. This project intends to explore these three research questions in the exascale computing context, being divided in three axis of development:

Research direction 1: Fundamentals for the scaling of schedulers

Coordination: Denis Trystram (Inria Moais) and Alfredo Goldman (USP)
Subject: Exascale applications will be composed of several billions threads. The scheduling
of all such processing units in a potentially large-scale machine demands new scheduling
algorithms that take into account new characteristics such as large distribution, lack
of centralized unit for taking decisions, and so on. This axis will focus on developing
exascale schedulers capable of dealing with such systems and the large number of
processing units and threads to be executed.

Research direction 2: Design of schedulers for large-scale infrastructures

This axis is divided in two parts:

2.1 Many-core platforms and low consumption scheduling

Coordination: Philippe Olivier Alexandre Navaux (UFRGS), Jean-François Méhaut (Nanosim) and Henrique Cota de Freitas (PUC Minas)
Subject: Investigation of how to map threads of parallel applications on hybrid many-core heterogeneous platform in order to achieve high performance, low energy, and scalability. The combination of heterogeneity aspects plus the diversity of application characteristics leads to a very large solution space that cannot be tackled by traditional approaches. As a consequence, we are particularly interested in applying machine learning techniques to design new thread mapping strategies that would be implemented in energy-aware schedulers for large-scale infrastructures.

2.2 Adaptive scheduling

Coordination: Thierry Gautier and Bruno Raffin (Inria Moais), and Nicolas Maillard (UFRGS)
Subject: Design and experimental validation of new parallel algorithmic schemes enabling to extract and schedule on-demand a parallelism adapted to the current state of the machine. In the classical bottom-up approach, the application provides fine grain tasks that are then clustered to obtain a minimal parallel degree. But relying on a fine grain led to significant overheads. The top-down approach is based on a work-stealing scheduling driven by idle resources. A local sequential depth-first execution of tasks is favored when recursive parallelism is available. Tasks are created even if no extra parallelism is required, leading to overheads. These overheads become critical when targeting exascale machines, calling for the development of new approaches. Adaptive scheduling proposes to extract parallelism on-demand. This is quite straightforward for some simple patterns (loop with independent iterations), but in the general case this requires to design a new algorithm. Our goal is to pursue this effort taking into consideration the increase in number of cores as well the data movements across the different memory levels.

Research direction 3: Tools for the analysis of large scale schedulers

This axis is divided in two parts:

3.1 Trace analysis and visualization

Coordination: Lucas Mello Schnorr (UFRGS) and Jean-Marc Vincent (Mescal)
Subject: Proposing new exascale schedulers requires to understand large-scale traces obtained from existing machines and parallel applications. Correlating all such information without the help of auxiliary tools is counter-productive and cumbersome. We intend to propose techniques and visualization tools to improve the understanding of such information, inspiring the development of better scheduling algorithms for exascale.

3.2 Simulation for prediction and tuning

Coordination: Arnaud Legrand (Mescal) and Lucas Mello Schnorr (UFRGS)
Subject: Obtaining a good resource usage on exascale machines will require a lot of information about the past scheduling actions and application behavior. Evaluating the sensibility of scheduling algorithms to the accuracy of such information or to unexpected events will reveal crucial at such scale. Hence the need for realistic and scalable simulation tools that will allow to evaluate schedulers in a variety of scenarios before a real-world deployment. Furthermore, the impact of scheduling decisions may be very difficult to anticipate. It is thus tempting to design schedulers that would use both past information and simulation outputs to try to anticipate the impact of scheduling decisions on the application and platform behavior.