Register (starts May 11, 2015 and ends June 12, 2015)
Robots have gradually moved from factory floors to populated areas. Therefore, there is a crucial need to endow robots with perceptual and interaction skills enabling them to communicate with people in the most natural way. With auditory signals distinctively characterizing physical environments and speech being the most effective means of communication among people, robots must be able to fully extract the rich auditory information from their environment. This course will address fundamental issues in robot hearing; it will describe methodologies requiring two or more microphones embedded into a robot head, thus enabling sound-source localization, sound-source separation, and fusion of auditory and visual information. The course will start by briefly describing the role of hearing in human-robot interaction, overviewing the human binaural system, and introducing the computational auditory scene analysis paradigm. Then, it will describe in detail sound propagation models, audio signal processing techniques, geometric models for source localization, and unsupervised and supervised machine learning techniques for characterizing binaural hearing, fusing acoustic and visual data, and designing practical algorithms. The course will be illustrated with numerous videos shot in the author’s laboratory.
- Week 1: Introduction to Robot Hearing
- Week 2 : Methodological Foundations
- Week 3 : Sound-Source Localization
- Week 4 : Machine Learning and Binaural Hearing
- Week 5 : Fusion of Audio and Vision
MOOC available online
Signal Processing for Communications. Paolo Prandoni and Martin Vetterli. EPFL Press, 2008.
Auditory Neuroscience. Jan Schnupp, Israel Nelken, and Andrew King. Auditory Neuroscience. MIT Press, 2011.
Machine Audition, Principles, Algorithms, and Systems. Wenwu Wang. IGI Global. 2011.
Audio Signal Processing For Next-Generation Multimedia Communication Systems. Yiteng (Arden) Huang and Jacob Benesty (Eds.). Kluwer Academic Publishers. 2004.
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression. Antoine Deleforge, Radu Horaud, Yoav Y. Schechner, Laurent Girin. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers (IEEE), 2015, 23 (4), pp.718-731.
Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds. Antoine Deleforge, Florence Forbes, Radu Horaud. International Journal of Neural Systems, World Scientific Publishing, 2015, 25 (1).
Vision-Guided Robot Hearing. Xavier Alameda-Pineda, Radu Horaud. The International Journal of Robotics Research, SAGE Publications, 2014.
A Geometric Approach to Sound Source Localization from Time-Delay Estimates. Xavier Alameda-Pineda, Radu Horaud. IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (6), pp.1082-1095.
Audio-Visual Speaker Localization via Weighted Clustering. Israel-Dejene Gebru, Xavier Alameda-Pineda, Radu Horaud, Florence Forbes. IEEE Workshop on Machine Learning for Signal Processing, Sep 2014, Reims, France.
Alignment of Binocular-Binaural Data Using a Moving Audio-Visual Target. Vasil Khalidov, Florence Forbes, Radu Horaud. MMSP 2013 – IEEE International Workshop on Multimedia Signal Processing, Sep 2013, Pula (Sardinia), Italy, pp.242-247.
Active-Speaker Detection and Localization with Microphones and Cameras Embedded into a Robotic Head. Jan Cech, Ravi Mittal, Antoine Deleforge, Jordi Sanchez-Riera, Xavier Alameda-Pineda, Radu Horaud. Humanoids 2013 – IEEE-RAS International Conference on Humanoid Robots, Oct 2013, Atlanta, United States.
Online Multimodal Speaker Detection for Humanoid Robots. Jordi Sanchez-Riera, Xavier Alameda-Pineda, Johannes Wienke, Antoine Deleforge, Soraya Arias, Jan Cech, Sebastian Wrede, Radu Horaud. Humanoids 2012 – IEEE International Conference on Humanoid Robotics, Nov 2012, Osaka, Japan. IEEE, pp.126-133.