Yanlei Diao: Scalable, Low-Latency Data Analytics and its Applications

14.00, room 445, PCRI

Abstract
An integral part of many data-intensive applications is the need to collect and analyze enormous data sets, such as click streams, search logs, and sensor streams to derive answers and insights with low latencies. Concurrently, new programming models and architectures have been developed for large-scale cluster computing, exemplified by recent MapReduce systems. However, these systems are designed for batch processing and require data set to be fully loaded into the cluster before running analytical queries, hence causing high delays of query answers.

In this talk, I present the design of a scalable, low-latency analytics platform, called Scalla, that fundamentally transforms the existing cluster computing paradigm into an incremental parallel processing paradigm, which provides the combined benefits of massive parallelism, incremental answers, and I/O efficiency. Our technical contributions include replacing an existing popular mechanism for partitioned parallelism with a purely hash-based mechanism and using dynamic frequency analysis to offer in-memory processing for most of the data. In this talk, I will also examine two application scenarios, click stream analysis, which has been used in our evaluation, and genomic data analysis, which is a new project that leverages Scalla for massive-scale genomic data processing and analysis.

Short bio
Yanlei Diao is an Associate Professor of Computer Science at the University of Massachusetts Amherst. Her research interests are in information architectures and data management systems, with a focus on large-scale data analysis, data streams, uncertain data management, and flash memory databases. She received her PhD in Computer Science from the University of California, Berkeley in 2005, her M.S. in Computer Science from the Hong Kong University of Science and Technology in 2000, and her B.S. in Computer Science from Fudan University in 1998.

Yanlei Diao was a recipient of the NSF Career Award and the IBM Scalable Innovation Faculty Award, and was a finalist of the Microsoft Research New Faculty Fellowship. She spoke at the Distinguished Faculty Lecture Series at the University of Texas at Austin. Her PhD dissertation “Query Processing for Large-Scale XML Message Brokering” won the 2006 ACM-SIGMOD Dissertation Award Honorable Mention. She is an associate editor of PVLDB 2013 and has served on the organizing committees of SIGMOD, CIDR, DMSN, the New Researcher Symposium, and the New England Database Summit. She has served on program committees of numerous international conferences and workshops.

Permanent link to this article: https://team.inria.fr/oak/2012/12/20/yanlei-diao-scalable-low-latency-data-analytics-and-its-applications/