Luciano Di Palma: “Graph Methods for Complex and Conversational Question Answering over Knowledge Bases”

Luciano Di Palma will present his work on May 26th at 3pm. It will be online on Zoom at https://ecolepolytechnique.zoom.us/j/86323133834?pwd=QzFqdlpwalBwTUtRbzQxYWQwSXNLUT09

Title: New Algorithms and Optimizations for “Human-in-the-Loop” Model Development

Abstract:
In a recent trend known as “Machine Learning for Everyone”, IT companies are delivering cloud platforms to help every data user to develop machine learning models for their data sets with minimum effort. A key question, however, is how to obtain a high-quality training data set for model development with minimum user effort. While industry solutions to this problem are limited to manual labeling or crowdsourcing, recent research on interactive data exploration for model development bridges the gap between the data exploration and machine learning communities, and brings active learning-based data exploration to bear on the new process of model learning.

Existing active learning techniques, however, often fail to provide satisfactory performance when built over large data sets. Not only such models often require hundreds of labeled data instances in order to reach high accuracy, but retrieving the next instance to label can be time consuming, making it incompatible with the interactive nature of the human exploration process. To address these issues, our work embodies two main ideas: First, we introduce a version space based active learning algorithm for kernel classifiers, which not only has strong theoretical guarantees on convergence, but also allows for an efficient implementation in time and space. Second, by leveraging additional insights obtained in the user exploration and labeling process, we explore a new opportunity to factorize an active learner so that active learning can be performed in a set of low-dimensional subspaces, which further expedites convergence and reduces the user labeling effort. In addition, we also provide a new factorization learning algorithm that is capable of automatically learning a factorization structure from a set of labeled examples, which can then be used to expedite model convergence.

Using real-world data sets and a large suite of user interest patterns, our evaluation results show that our optimized version space algorithms outperform existing VS algorithms, as well as DSM, a factorization-aware algorithm, often by a wide margin while maintaining interactive speed.

Bio:
Luciano Di Palma is a 4th-year PhD student in Active Learning and Data Exploration at the CEDAR team, a joint research group between Ecole Polytechnique and Inria Saclay. Luciano graduated from a double-degree international program between University of Sao Paulo (Brazil) and Ecole Polytechnique (France), obtaining a BsC in Physics and an “ingenieur polytechnicien” degree. For this masters, Luciano focused on the Machine Learning and Data Science domains, obtaining a Data Science degree from Université Paris-Saclay. His current interests lies in creating Machine Learning systems solving real-world problems.

Comments are closed.