The goal of the TADaaM project is to design and build a stateful system-wide service layer for data management in HPC systems. This layer will be twofold. First, it will abstract low level features of the system (e.g. topology, network, resource usage) and the software stack (e.g. threads, data). Second, applications will be able to register their needs and behavior thanks to a carefully designed API. With these two sets of information, the layer will optimize the execution of all
the running applications in a coordinate fashion and at system-scale.
TADaaM, we propose a new approach where we allow the application to explicitly express its resource needs about its execution. The application needs to express its behavior, but in a different way from the compute-centric approach, as the additional information is not necessarily focused on computation and on instructions execution, but follows a high-level semantics (needs of large memory for some processes, start of a communication phase, need to refine the granularity, beginning of a storage access phase, description of data affinity, etc.). These needs will be expressed to a service layer though an API. The service layer will be system-wide (able to gather a global knowledge) and stateful (able to take decision based on the current request but also on previous ones). The API shall enable the application to access this service layer through a well-defined set of functions, based on carefully designed abstractions.
the goal of TADaaM is to design a stateful system-wide service layer for HPC systems, in order to optimize applications execution according to their needs.
This layer will abstract low-level details of the architecture and the software stack, and will allow applications to register their needs. Then, according to these requests and to the environment characteristics, this layer will feature an engine to optimize the execution of the applications at system-scale, taking into account the gathered global knowledge and previous requests.
This approach exhibits several key characteristics:
- It is independent from the application parallelization, the programming model, the numerical scheme and, largely, from the data layout. Indeed, high-level semantic requests can easily be added to the application code after the problem has been modeled, parallelized, and most of the time after the data layout has been designed and optimized. Therefore, this approach is – to a large extent – orthogonal to other optimization mechanisms and does not require application developers to rewrite their code.
- Application developers are the persons who know best their code and therefore the needs of their application. They can easily (if the interface is well designed and the abstractions are correctly exposed), express the application needs in terms of resource usage and interaction with the whole environment.
- Being stateful and shared by all the applications in the parallel environment, the proposed layer will therefore enable optimizations that:
- cannot be performed statically but require information only known at launch- or run-time,
- are incremental and require minimal changes to the application execution scheme,
- deal with several parts of the environment at the same time (e.g., batch scheduler, I/O, process manager and storage),
- take into account the needs of several applications at the same time and deal with their interaction. This will be useful, for instance, to handle network contention, storage access or any other shared resources.