Deep Regression Models and Computer Vision Applications for Multiperson Human-Robot Interaction

PhD defense by Stéphane Lathuilière

Tuesday 22nd May 2018, 11:00, Grand Amphithéatre

INRIA Grenoble Rhône-Alpes, Montbonnot Saint-Martin

Abstract:

In order to interact with humans, robots need to perform basic perception tasks such as face detection, human pose estimation or speech recognition. However, in order have a natural interaction with humans, the robot needs to model high level concepts such as speech turns, focus of attention or interactions between participants in a conversation. In this manuscript, we follow a top-down approach. On the one hand, we present two high-level methods that model collective human behaviors. We propose a model able to recognize activities that are performed by different groups of people jointly, such as queueing, talking. Our approach handles the general case where several group activities can occur simultaneously and in sequence. On the other hand, we introduce a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and adapt its gaze control strategy in the context of human-robot interaction. The robot is able to learn to focus its attention on groups of people from its own audio-visual experiences.

Second, we study in detail deep learning approaches for regression problems. Regression problems are crucial in the context of human-robot interaction in order to obtain reliable information about head and body poses or the age of the persons facing the robot. Consequently, these contributions are really general and can be applied in many different contexts. First, we propose to couple a Gaussian mixture of linear inverse regressions with a convolutional neural network. Second, we introduce a Gaussian-uniform mixture model in order to make the training algorithm more robust to noisy annotations. Finally, we perform a large-scale study to measure the impact of several architecture choices and extract practical recommendations when using deep learning approaches in regression tasks. For each of these contributions, a strong experimental validation has been performed with real-time experiments on the NAO robot or on large and diverse data-sets.

Jury:
Josef Sivic,INRIA Paris (rapporteur)
Elisa Ricci, University of Perugia (rapporteur)
Xavier Alameda-Pineda, INRIA Grenoble rhône-Alpes (examinateur)
Christian Wolf, INSA Lyon (examinateur)
Radu Horaud, INRIA Grenoble Rhône-Alpes (directeur)
Cordelia Schmid,INRIA Grenoble Rhône-Alpes (présidente)

Deep Regression Models and Computer Vision Applications for Multiperson Human-Robot Interaction

PhD defense by Stéphane Lathuilière

Tuesday 22nd May 2018, 11:00, Grand Amphithéatre

INRIA Grenoble Rhône-Alpes, Montbonnot Saint-Martin

Stephane LATHUILIERE