In 2020, the volume of digital data is expected to reach 40 ZB and the number of network-connected devices (sensors, actuators, instruments, computers, and data stores) is expected to reach 50 billion, roughly 5 times more than the population of the planet projected for that year. While these devices vary dramatically in their capabilities and number – billions of relatively weak components at the base of the pyramid, scaling up to exascale supercomputers and massive scientific instruments at its peak – taken collectively they represent a vast continuum of computing power and data generators that scientific, economic, social and governmental concerns of all kinds, public and private, will want and need to utilize. The challenge of creating a transnational public infrastructure that can provide ubiquitous but appropriate access to this shared continuum of scalable computing and data resources is daunting.
A widely recognized and major difficulty is the growing split between three software ecosystems: 1) the traditional HPC ecosystem; 2) the rapidly growing Big Data analysis tools (BDA) and 3) the recently emerged machine-learning technologies. At a time when scientific communities are striving to become more international, more interdisciplinary, and more collaborative than ever, the major technical differences between these ecosystems threaten to obstruct future cooperation and progress. International efforts are now focusing on creating shared distributed computing platforms to manage the logistics of massive, multistage data workflows with their sources at the network edge.
In this new context, this Associate Team aims to explore innovative approaches to workflow optimization, adaptive data management and processing through hybrid techniques leveraging the strengths of the three aforementioned ecosystems.