Fatemeh Nargesian “Tabular Data Discovery in Data Lakes”

Fatemeh Nargesian will present her work on March 12th, 2024, at 2 pm. The seminar will be held in Grace Hopper room, in the Alan Turing Building (Palaiseau), and online here.  

Title : 

Tabular Data Discovery in Data Lakes

Abstract: 

Tabular data discovery streamlines the construction and integration of tables, utilized in downstream data science tasks, from massive collections of data sources such as data lakes. This process involves
efficiently identifying relevant tables and discovering queries that facilitate the construction of datasets beneficial for downstream data science task. In this talk, I will first describe how to develop efficient
index structures for table discovery, based on equi-join, semantic-join, and union operations. Next, I will show how leveraging table version histories can reveal fine-grained schematic links among columns in data lakes. We will also see how to construct a navigational structure over data lakes, presenting an alternative discovery method to conventional keyword searches. Finally, I will conclude by a discussion on the challenges of using discovered queries for approximate query answering.

Bio:

Fatemeh Nargesian is an assistant professor of computer science at the University of Rochester. She obtained her PhD at the University of Toronto. Her research interests are in data acquisition for AI and
scientific time-series management. Her work has appeared at top-tier venues including VLDB, SIGMOD, and ICDE and has won the best demo award of VLDB 2017.

 

Comments are closed.