Huang Enhui: “Active Learning Methods for Interactive Exploration on Large Databases”

Huang Enhui will present her work on June 23rd at 3pm.

It will be online at https://ecolepolytechnique.zoom.us/j/86323133834?pwd=QzFqdlpwalBwTUtRbzQxYWQwSXNLUT09

Title
Active Learning Methods for Interactive Exploration on Large Databases

Abstract
Faced with an increasing gap between fast growth of data and limited human ability to comprehend data, data analytics tools are now in high demand in many applications across a broad set of domains. In particular, for interactive data exploration systems, an “explore-by-example” framework, which aims to assist the user in performing highly effective data exploration while minimizing the human effort, is becoming increasingly popular. However, the state-of-the-art explore-by-example systems still require a large number of labeled examples to achieve the desired accuracy and cannot handle noisy labels.
To address both the slow convergence problem and the label noise problem, in our work, we cast the explore-by-example problem in a principled “active learning” framework, and bring the properties of important classes of the user interest to bear on the design of new algorithms and optimizations for active learning-based data exploration. Our main contributions are: First, we combine a polytope-based model and a traditional active learner into a new Dual-Space Model (DSM), jointly offering the prediction and sample acquisition functionalities and thereby expediting model convergence. Second, when data exploration is performed in high dimensions, we factorize the high-dimensional space into a set of low-dimensional spaces, build a DSM in each subspace and combine them to be a factorized DSM, which is theoretically guaranteed to improve accuracy and convergence speed. Last but not least, we address the label noise problem by integrating advanced data cleansing methods and a refinement of the polytope-based model into DSM.
Evaluation results using real-world datasets and user interest patterns show that our system significantly outperforms the state-of-the-art explore-by-example systems in accuracy and convergence speed while achieving reasonable efficiency for interactive data exploration.

Bio
Enhui Huang is a final-year Ph.D. candidate at CEDAR, a joint research team of Ecole Polytechnique and Inria Saclay. Her research interests are Active Learning, Interactive Data Exploration, and Learning with Label Noise. She is currently working on an active learning-based data exploration project AIDEme (www.lix.polytechnique.fr/aideme).

 

Comments are closed.