Joint Talk of Alexandru Costan and Gabriel Antoniu @IPN: From Big Data to Fast Data


During their visit at IPN, Gabriel Antoniu and Alexandru Costan gave a talk on October 30, 2019, summarizing the main challenges faced by the SmartFastData project with respect to stream processing and highlighting the main contributions of the project to address them.


From Big Data to Fast Data: Efficient Stream Data Management


The most explosive proliferation in data generation today is taking place across the network from the cloud datacenters, at its edge. Such new data sources include major scientific experiments (e.g., LHC at CERN) and instruments (e.g., Square Kilometre Array telescope), and a deluge of distributed sensors from the Internet of Things (IoT). The traditional approach of shipping all data to the cloud for batch analytics is no longer a viable option due to the high latency of the Wide Area Networks (WANs) connecting the edge and the datacenters. This disruptive change makes the advance in many different areas of research uncertain.

In this talk we will present the general context of stream data management in light of this recent transition from Big to Fast Data. After highlighting the challenges at the data level associated with batch and real-time analytics, we will introduce a subjective overview of proposals to address them. They bring solutions to the problems of in-transit stream storage and processing, fast data transfers, distributed metadata management, dynamic ingestion and transactional storage. The integration of these solutions into functional prototypes and the results of the large-scale experimental evaluations on clusters, clouds and supercomputers demonstrate their effectiveness for several real-life applications ranging from neuro-science to LHC nuclear physics. Finally, these contributions are put into the perspective of the High Performance Computing – Big Data convergence.


Gabriel Antoniu is a Senior Research Scientist at Inria, Rennes. He leads the KerData research team, focusing on storage and I/O management for Big Data processing on scalable infrastructures (clouds, HPC systems) and on the HPC/Big Data convergence at storage and data processing level. He currently serves as Vice Executive Director of JLESC – Joint Inria- Illinois- ANL-BSC-JSC-RIKEN/AICS Laboratory for Extreme-Scale Computing on behalf of Inria. He received his Ph.D. degree in Computer Science in 2001 from ENS Lyon. He has led several international projects in partnership with Microsoft Research, IBM, Argonne National Lab, the University of Illinois at Urbana Champaign, Huawei. He served as Program Chair for the IEEE Cluster conference in 2014 and 2017 and regularly serves as a PC member of major conferences in the area of HPC, cloud computing and Big Data (SC, HPDC, CCGRID, Cluster, Big Data, etc.). He has acted as advisor for 19 PhD theses and has co-authored over 140 international publications in the aforementioned areas.

Alexandru Costan is an Associate Professor at INSA Rennes and a researcher within the KerData team at IRISA Rennes. In 2011, he obtained a Ph.D. in Computer Science from the Politehnica University of Bucharest (PUB for a thesis focused on self-adaptive behavior of large-scale distributed systems based on monitoring information, bringing several contributions to the MonALISA monitoring system, developed in collaboration with Caltech and CERN. In 2012, he became an Associate Professor at INSA Rennes, where he is currently leading the Big Data Science track. His research interests include Big Data management in HPC and clouds, fast data and stream processing, autonomic behavior and workflow management. Alexandru has published two books, more than 20 articles in international journals and 30 papers in international conferences. He serves as PC member of several top-level conferences and workshops in the domain of distributed computing (SuperComputing, CCGrid, Cluster, Big Data). Since 2011 he is the co-chair of the BigDataCloud workshop at EuroPar as well as the ScienceCloud workshop at HPDC (since 2015). He is currently leading the ANR OverFlow project and and he is a member of the JLESC: Joint Laboratory on Extreme-Scale Computing.


Comments are closed.