UNIFY Associate Team – Intelligent Unified Data Services for Hybrid Workflows Combining Compute-Intensive Simulations and Data-Intensive Analytics at Extreme Scales

In 2020, the volume of digital data is expected to reach 40 ZB and the number of network-connected devices (sensors, actuators, instruments, computers, and data stores) is expected to reach 50 billion, roughly 5 times more than the population of the planet projected for that year. While these devices vary dramatically in their capabilities and number – billions of relatively weak components at the base of the pyramid, scaling up to exascale supercomputers and massive scientific instruments at its peak – taken collectively they represent a vast continuum of computing power and data generators that scientific, economic, social and governmental concerns of all kinds, public and private, will want and need to utilize.

The scientific context of the project is the emergence of complex, distributed, ”hybrid” workflows (i.e., combining HPC simulations, Big Data analytics and learning). In our first 3-year period (2020-2022) we addressed some data-related challenges posed by this context, in particular we focused on storage modelling, considering the elastic management of distributed storage resources. In our second 3-year period (2023-2025) our intention is to take into account multiple aspects of the hybrid execution infrastructure required by the aforementioned workflows, consisting of a juxtaposition of edge devices interconnected with cloud infrastructures and supercomputers (aka the Computing Continuum). In a general scheme, Edge devices create streams of input data, which are processed by data analytics and machine learning applications in the Cloud, whereas simulations on large, specialised HPC systems provide insights into and prediction of future system state.

The emergence of such workflows is reshaping the traditional vision on the areas involved, as described in the ETP4HPC Research Agendas published in 2020 and 2022. Building software ecosystems addressing the needs of such workflows poses multiple challenges at several levels, as highlighted in a recent ETP4HPC white paper dedicated to the Computing Continuum.

In this context, this Associate Team will focus on three related challenges:

How to adequately handle the heterogeneity of storage resources within the Computing Continuum to support complex science workflows?
How to efficiently support deep-learning workloads across the Computing Continuum?
How to provide reproducibility support for experimentation across the Computing Continuum?

Presentation