↑ Return to Projects


Vision and Hearing In Action (VHIA)

ERC Advanced Grant #340113

VHIA’s list of publications | Research | Recently Submitted Papers | News

Principal Investigator: Radu Horaud | Duration: 1/2/2014 – 31/1/2019 (five years) | ERC funding: 2,497,000€
VHIA studies the fundamentals of audio-visual perception in the context of human-robot interaction




Research pages (please click here for a complete list of our research pages)

Audio-visual speaker diarization

Continuous action recognition

EM algorithms for weighted-data clustering

High-dimensional regression

Deep mixture of linear inverse regression (DMLIR)

Separation of time-varying audio mixtures

Skeletal quads

Supervised sound-source localization

Recently submitted papers

B. Massé, S. Ba and R. Horaud. (March 2017). Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence.

X. Li, L. Girin, S. Gannot and R. Horaud. (October 2016). Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization with Spatial Sparsity Regularization. Submitted to IEEE Transactions on Audio, Speech, and Language Processing.

G. Evangelidis and R. Horaud. (August 2016). Joint Registration of Multiple Point Sets. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence. https://arxiv.org/abs/1609.01466.


March, 2017, best paper award: Israel Dejene Gebru (PhD student) and his co-authors received the best paper award at IEEE HSCMA’17.

January, 2017: R. Horaud was invited to join the editorial board of the IEEE Robotics and Automation Letters as an associate editor.

January, 2017, paper accepted for publication: V. Drouard, R. Horaud, A. Deleforge, S. Ba, and G. Evangelidis. (March 2016). Robust Head Pose Estimation Based on Partially-Latent Mixture of Linear Regression. Submitted to IEEE Transactions on Image Processinghttp://arxiv.org/abs/1603.09732

December, 2016, paper accepted for publication: I. D. Gebru, S. Ba, X. Li and R. Horaud. Audio-Visual Speaker Diarization Based on a Spatiotemporal Bayesian Model. IEEE Transactions on Pattern Analysis and Machine Intelligencehttp://arxiv.org/abs/1603.09725

July, 2016, paper accepted for publication: X. Li, L. Girin, R. Horaud, S. Gannot. Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization. IEEE/ACM Transactions on Audio, Speech, and Language Processinghttp://arxiv.org/abs/1509.03205.

July, 2016, paper accepted for publication: S. Ba, X. Alameda-Pineda, A. Xompero, R. Horaud. An On-line Variational Bayesian Model for Multi-Person Tracking from Cluttered Scenes.  http://arxiv.org/abs/1509.01520

June, 2016, paper accepted for publication: R.Horaud, G.Evangelidis, M. Hansard, C. Ménier. An Overview of Range Scanners and Depth Cameras Based on Time-of-Flight Technologies. Machine Vision and Applicationshttps://hal.inria.fr/hal-01325045.

May, 2016, award: A.Deleforge (former PhD student), F. Forbes (collaborator), and R.Horaud received the 2016 Award for Outstanding Contributions in Neural Systems for their paper:  “Acoustic Space Learning for Sound-source Separation and Localization on Binaural Manifolds,” International Journal of Neural Systems, 2015, 25:1, 1440003 (21 pages), http://arxiv.org/abs/1402.2683

April, 2016, paper accepted for publication: D. Kounnades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot,  R. Horaud. A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures. IEEE/ACM Transaction on Audio, Speech, and Language Processing. http://arxiv.org/abs/1510.04595.

January, 2016, paper accepted for publication: I. D. Gebru, X. Alameda-Pineda, F. Forbes, R.Horaud. EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.   http://arxiv.org/abs/1509.01509.

October, 2015, award: Dionyssos Kounades Bastian (PhD student in the PERCEPTION team) received the best student paper award at IEEE WASPAA’15.

September, 2015, award: Vincent Drouard (PhD student in the PERCEPTION team) received the best student paper award (2nd place) at IEEE ICIP’15

Project “official” summary (09/2013)

The objective of VHIA is to elaborate a holistic computational paradigm of perception and of perception-action loops. We propose to develop a completely novel twofold approach: (i) learn from mappings between auditory/visual inputs and structured outputs, and from sensorimotor contingencies, and (ii) execute perception-action interaction cycles in the real world with a humanoid robot. VHIA will launch and achieve a unique fine coupling between methodological findings and proof-of-concept implementations using the consumer humanoid NAO manufactured in Europe. The proposed multimodal approach is in strong contrast with current computational paradigms that are based on unimodal biological theories. These theories have hypothesized a modular view of perception, postulating that there are quasi-independent and parallel perceptual pathways in the brain. VHIA takes a radically different view than today’s audiovisual fusion models that rely on clean-speech signals and on accurate frontal-images of faces; These models assume that  videos and sounds are recorded with hand-held or head-mounted sensors, and hence there is a human in the loop whose intentions inherently supervise both perception and interaction. Our approach deeply contradicts the belief that complex and expensive humanoids (often manufactured in Japan) are required to implement research ideas. VHIA’s methodological program addresses extremely difficult issues, such as how to build a joint audiovisual space from heterogeneous, noisy,  ambiguous and physically different visual and auditory stimuli,  how to properly model seamless interaction based on perception and action, how to deal with high-dimensional input data, and how to achieve robust and efficient human-humanoid communication tasks through a well-thought tradeoff between offline training and online execution. VHIA bets on the high-risk idea that in the next decades robot technology will have a considerable social and economical impact and that there will be millions of humanoids, in our homes, schools and offices, which will  be able to naturally communicate with us.

Perception team (current and past) members involved in VHIA

Radu Horaud (PI), Laurent Girin, Xavi Alameda-Pineda, Georgios Evangelidis (2014-2015), Sileye Ba (2014-2016), Soraya Arias, Israel Dejene-Gebru, Vincent Drouard, Benoit Massé, Stéphane Lathuilière, Dionyssos Kounades-Bastian, Yutong Ban, Xiaofei Li, Pablo Mesejo, Guillaume Sarrazin, Bastien Mourgue.


Florence Forbes (INRIA), Sharon Gannot (Bar Ilan University), Xavi Alameda-Pineda (Trento University), Antoine Deleforge (INRIA Rennes), Olivier Alata (Mines Saint-Etienne), Christine Evers (Imperial College), Yoav Schechner (Technion).


All our articles are available for download on HAL (open archive)

VHIA’s list of publications