Guiding Robot Audition with Motor Data: A Hybrid Classifier-Dictionary Approach

Monday, March 7, 2016, 11:00 to 12:00, room F107, INRIA Montbonnot

Seminar by Antoine Deleforge, PANAMA team, INRIA Rennes-Atlantique

A specificity of human-robot interaction (HRI), as opposed to human-computer interaction (HCI), is the ability of a robot to perform movements. On the one hand, this ability may constitute an advantage, enabling active perception. On the other hand, it often makes the signals recorded by the robot’s sensors more difficult to process in practice. In the case of audition, the noise created by the robot’s moving body parts and actuators, called “egonoise”, severely degrades the sounds recorded at its microphones. This hinders audio source localization and speech recognition, which are instrumental in HRI. From a signal processing point of view, reducing ego-noise is particularly challenging because it is often louder than the signals of interests, non-stationary, non point-like and non-static. However, two key features may be exploited. First, ego-noise benefits from a well defined structure both spatially and over the spectrum, due to the deterministic nature of a robotic system. Second, the motor-state of the robot, e.g., its joints’ angle and speed, may be available along time through proprioceptors, providing valuable extra information. We propose to exploit the first feature by learning a multi-channel dictionary from ego-noise signal examples. The learned dictionary capture both spectral and spatial characteristics of the noise, thanks to a novel phase-optimization scheme. We then show how the dictionary information can be advantageously fused with instantaneous motor-state information at runtime, using pre-trained support-vector machine classifiers. Results obtained with the proposed approach on real data will be compared to conventional methods and illustrated by audio and video examples.