DECoHPC is a joint team (équipe associée) between Inria, in France, and a few Brazilian institutions: the National Laboratory for Scientific Computing (LNCC), the Federal Fluminense University (UFF), the Federal University of Rio Grande do Sul (UFRGS), and the Federal Center for Technological Education of Rio de Janeiro (CEFET-RJ). It started at the beginning of 2024 and is expected to last until December 2026.
Other contributing institutions are: University of Bordeaux, LaBRI, and CNRS
Context
Supercomputers were conceived to efficiently run traditional HPC applications, namely numerical simulations. However, in the context of the convergence between HPC and big data, their workload is becoming more heterogeneous.
In this new scenario, efficient application execution becomes more challenging. Moreover, energy consumption has emerged as an important concern for HPC and computer science in general. First, with the effects of climate change, environmental concerns have become a major focus across various scientific fields. Second, as more and more exascale machines emerge, the energy budget has become one of the main concerns for these machines, driven not only by environmental considerations but also by economic ones.
Research goals
The previous HPCProSol associate team (2021–2023) provided us with performance insights about two kinds of representative applications from Santos Dumont: finite element methods (HPC) and bioinformatics workflows (HPDA). Moreover, we collaborated on advancing the system’s monitoring infrastructure by developing software to efficiently process it. Now, in the DECoHPC associate team, we aim to take these insights and tools and extend them towards our three main goals:
- (WP1) Based on the Santos Dumont’s traces (recently made available), to obtain a holistic view of the I/O behavior of HPC applications. We want to classify applications according to their behaviors — and on their different needs from the system.
- (WP2) To study and characterize the energy consumption of moving applications’ data through the network and I/O infrastructure.
- (WP3) To characterize the I/O performance and energy consumption of AI applications, which have not been explored in HPCProSol, but are now among one of the most important users of HPC facilities.
These studies will follow a mostly experimental methodology, based on information collected from the Santos Dumont machine. We also benefit from the support of local French infrastructure (such as PlaFRIM in the Inria Center of the University of Bordeaux). The knowledge obtained from these studies will allow us to propose solutions to improve the energy consumption and I/O performance of applications in HPC.