Ghufran Khan: “Scalable Analytics on Multi-Streams Dynamic Graphs“

We continue our series of team seminars with our own Ghufran Khan, who will present his PhD thesis titled “Scalable Analytics on Multi-Streams Dynamic Graphs” as a rehearsal for his PhD defense.


The seminar will take place on Thursday, 11th of September, 14:00-15:00, at the Emmy Noether room on the 2nd floor (Alan Turing Building). The presentation will last approximately 45 minutes, followed by feedback and questions to Ghufran.

Abstract:

Many real-time applications, such as financial platforms, social networks, and transportation systems, rely on dynamic graphs built from multiple data streams that are highly dynamic and may arrive out of order. Supporting scalable ingestion with concurrent analytics, ensuring accuracy under out-of-order (ooo) updates, enabling efficient queries on present and historical snapshots, and handling both structural and property updates are major challenges. This thesis introduces HAL, a novel in-memory dynamic graph database engine that addresses these issues by processing updates from multiple streams while supporting concurrent and consistent analytics. At its core, HAL employs the Stream-Time Ordered Adjacency List (STAL), a data structure that maintains updates in stream-time order regardless of their arrival order, combined with a lightweight graph-aware multi-version concurrency control protocol for consistency. Experiments show that HAL achieves ingestion throughput of more than 7.5 million updates per second (up to 73× faster than existing systems) and delivers up to 357× speedups in analytical queries even under ooo updates. Beyond the core system, HAL supports historical queries that reconstruct graphs at any past stream-time moment, and property-rich graphs by allowing arbitrary properties on nodes and edges.