(Closed) MSc Project: Visual Multi-speaker Recognition for Human-robot Interaction

MSc project on “Visual Multi-speaker Recognition for Human-robot Interaction”

Duration: 6 months (and it may continue with a PhD)

Short description: The main goal of this project is to design and develop an automatic system able to characterize videos including multiple speakers. This system will provide useful information in human-robot interaction, like the number of people speaking at a certain moment, their spatial location, or the speakers’ head-pose estimation to predict where they are looking at when they are talking. A system like the one described would contribute to allow a robot to better recognize human activities and therefore to provide an appropriate response. Computer vision and machine learning techniques, with special emphasis on deep learning approaches for visual recognition, will be the main tools to employ in order to fulfill this objective. The approach to be developed also represents a first step towards combining video and audio information to perform audio-visual multi-speaker diarization.

Keywords: human activity recognition, deep learning, audio-visual scene analysis.

Information for applicants: Please send your complete CV to Pablo Mesejo (pablo.mesejo-santiago@inria.fr)

(Closed) MSc Project: Visual Multi-speaker Recognition for Human-robot Interaction

Pablo MESEJO SANTIAGO