Research

You are welcome to browse through our recent and current research results (alphabetical order). A broad and non-exhaustive list of the team’s research topics may be found on our  homepage. Some of this research is directly linked to recently submitted or accepted publications that can be found here. Please also refer to our complete list of publications

Acoustic Space Learning on Binaural Manifolds

Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds 2016 IJNS Award for Outstanding Contributions to Neural Systems Antoine Deleforge, Florence Forbes, and Radu Horaud International Journal of Neural Systems, 25 (1), 2015 PDF on arXiv | BibTeX | HAL | Additional papers | Matlab Code | Dataset | Videos and more  Abstract In this paper we …

View page »

Audio Source Separation: Yet Another NMF-Based Formulation

An Inverse-Gamma Source Variance Prior with Factorized Parameterization for Audio Source Separation IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016) D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud Abstract In this paper we present a new statistical model for the power spectral density (PSD) of an audio signal and its …

View page »

Audio-Visual Speaker Detection, Localization and Interaction with NAO

Publications | Videos | The NAO Robot   Abstract. In this research we address the problem of audio-visual speaker detection. We introduce an online system working on the humanoid robot NAO. The scene is perceived with two cameras and two microphones. A multimodal Gaussian Mixture Model fuses the information extracted from the auditory and visual sensors. The system …

View page »

Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion

IEEE Transactions on Pattern Analysis and Machine Intelligence special issue on Learning with Shared Information for Computer Vision and Multimedia Analysis Israel D. Gebru    Sileye Ba    Xiaofei Li    Radu P. Horaud [PDF on arXiv] [IEEE Xplore] [HAL] [DATASET] [BibTeX] Abstract Speaker diarization consists of assigning speech signals to speakers engaged in dialog. …

View page »

Audio-visual speaker localization via weighted clustering

Abstract In this paper we address the problem of detecting and locating speakers using audiovisual data. We address this problem in the framework of clustering. We propose a novel weighted clustering method based on a finite mixture model which explores the idea of non-uniform weighting of observations. Weighted-data clustering techniques have already been proposed, but …

View page »

Audio-visual Speech-Turn Detection and Tracking

Abstract Speaker diarization is an important component of multi-party dialog systems in order to assign speech-signal segments among participants. Diarization may well be viewed as the problem of detecting and tracking speech turns. It is proposed to address this problem by modeling the spatial coincidence of visual and auditory observations and by combining this coincidence …

View page »

Audio-Visual Tracking by Density Approximation in Sequential Bayesian Filtering Framework

Israel D. Gebru+    Christine Evers* Patrick A. Naylor*    Radu P. Horaud+ +INRIA Grenoble Rhône-Alpes, France *Imperial College London, Department of Electrical and Electronic Engineering, UK [ PDF ] [ BibTeX ] [ Code ] [ Video ] Abstract This paper proposes a novel audio-visual tracking approach that exploits constructively audio and visual modalities …

View page »

Continuous Action Recognition

Continuous Action Recognition Based on Sequence Alignment Kaustubh Kulkarni, Georgios Evangelidis, Jan Cech and Radu Horaud International Journal of Computer Vision (online) vol. 112, issue 1, March 2015, pp. 90-114 PDF on arXiv | BibTeX:  | PDF from HAL | Matlab code | Additional Papers | Videos Abstract: Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried …

View page »

Depth (TOF) and Stereo Fusion

Fusion of Range and Stereo Data for High-resolution Scene-modeling IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, No.11, 2015, pp. 2178-2192 (IEEE Xplore) G. Evangelidis, M. Hansard, and R. Horaud Abstract This paper addresses the problem of range-stereo fusion, for the construction of high-resolution depth maps. In particular, we combine low-resolution depth data with …

View page »

Direct-Path Relative Transfer Function for Audio Source Localization

Sound-Source Localization in Reverberant Rooms Based on the Direct-Path Relative Transfer Function Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot. Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 24, number 11, 2016.  [pdf] [bibtex] Xiaofei Li, Laurent Girin, Fabien Badeig, Radu Horaud. Reverberant …

View page »

EM Algorithms for Weigthed-Data Clustering with Application to Audio-Visual Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) Volume 38, number 12, pages 2402 – 2415, December 2016 Israel D. Gebru    Xavier Alameda-Pineda    Florence Forbes    Radu P. Horaud [ PDF on arXiv ]   [ PDF on IEEE Xplore ]   [ BibTex ]   [ CODE & DATASET ]   [ …

View page »

Eye Gaze and Visual Focus

We address the problem of estimating the visual focus of attention (VFOA), e.g. who is looking at whom? This is of particular interest in human-robot interactive scenarios, e.g. when the task requires to identify targets of interest and to track them over time. We make the following contributions. We propose a Bayesian temporal model that …

View page »

Finding Audio-Visual Events in Informal Social Gatherings

by Xavier Alameda-Pineda, Vasil Khalidov, Florence Forbes and Radu Horaud IEEE/ACM International Conference on Multimodal Interaction, 2011 Outstanding Paper Award Abstract In this paper we address the problem of detecting and localizing objects that can be both seen and heard, e.g., people. This may be solved within the framework of data clustering. We propose a new …

View page »

Geometric Sound Source Localization

A Geometric Approach to Sound Source Localization from Time-Delay Estimates Xavier Alameda-Pineda and Radu Horaud IEEE/ACM Transactions on Audio, Speech and Language Processing, 22(6), pages 1082-1095, June 2014 PDF on arXiv | BibTeX | HAL | Matlab toolbox | Additional Papers | Online multimedia Abstract: We address the problem of sound-source localization from time-delay estimates using arbitrarily-shaped non-coplanar microphone arrays. A novel …

View page »

Head Pose Estimation

Head Pose Estimation via Probabilistic High-Dimensional Regression Best Student Paper Award (2nd place) V. Drouard, S. Ba, G. Evangelidis, A. Deleforge, and R. Horaud IEEE International Conference on Image Processing (ICIP’15) Extended version published in IEEE Transactions on Image Processing, available on HAL Also, please visit our High-dimensional regression webpage IEEE Publication | HAL Publication …

View page »

Head-Pose Tracking

Switching Linear Inverse-Regression Model for Tracking Head Pose V. Drouard, S. Ba, and R. Horaud IEEE Winter Conference on Application of Computer Vision (WACV’17) IEEE Publication | HAL Publication | Abstract | BibTex | Results | Matlab code | Acknowledgement Abstract We propose to estimate the head-pose angles (pitch, yaw, and roll) by simultaneously predicting the …

View page »

High-Dimensional Regression

High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables Statistics and Computing, Springer, 2015, vol. 25, number 5, pages 893-911 Antoine Deleforge, Florence Forbes and Radu Horaud  Abstract | arXiv | HAL| Springer | Supplementary materials | Matlab toolbox | Slides | Citation and Bibtex  Abstract: The problem of approximating high-dimensional data with a low-dimensional representation is addressed. The article makes the …

View page »

Joint Registration of Multiple Point Sets

A Generative Model for the Joint Registration of Multiple Point Sets European Conference on Computer Vision (Computer Vision – ECCV 2014) An extended version submitted to IEEE TPAMI is available on arXiv: https://arxiv.org/abs/1609.01466 Lecture Notes in Computer Science Volume 8695, 2014, pp 109-122 G. Evangelidis, D. Kounades-Bastian, R. Horaud, E. Psarakis   Abstract This paper describes …

View page »

NAOLab

A Distributed Architecture for Interacting with NAO NAOLab is a middleware library for developing robotic applications in C, C++, Python and Matlab, using the humanoid robot NAO Software Download | Publications | People | Support | Acknowledgements NAOLab is a middleware for the development of robotic applications in C, C++, Python and Matlab, using the humanoid robot NAO …

View page »

Noise Power Spectral Density Estimation

Non-stationary Noise Power Spectral Density Estimation Based on Regional Statistics Xiaofei Li, Laurent Girin, Sharon Gannot and Radu Horaud The 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016. [HAL] [ pdf ] [ code ] Abstract Estimating the noise power spectral density (PSD) is essential for single channel speech enhancement algorithms. …

View page »

Online Variational Bayesian Tracking

Variational Bayesian Framework for Multi-Person Tracking Sileye Ba, Yutong Ban, Xavi Alameda-PIneda, Alessio Xompero, and Radu Horaud Papers | Matlab code | Results Object tracking is an ubiquitous problem in computer vision with many applications in human-machine and human-robot interaction, augmented reality, driving assistance, surveillance, etc. Although thoroughly investigated, tracking multiple persons remains a challenging …

View page »

Point Registration with Expectation-Maximization

Rigid and Articulated Point Registration with Expectation Conditional Maximization Radu Horaud, Florence Forbes, Manuel Yguel, Guillaume Dewaele, and Jian Zhang IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (3), 587-602, March 2011 Abstract  | code | pdf from HAL | IEEEXplore | Bibtex | Video of a toy example Abstract. This paper addresses the …

View page »

Recognition of Group Activities in Videos

Recognition of Group Activities in Videos Based on Single- and Two-Person Descriptors Stéphane Lathuilière, Georgios Evangelidis, Radu Horaud IEEE Winter Conference on Application of Computer Vision (WACV’17) IEEE Publication | HAL Publication | Abstract | BibTex | Results | Acknowledgement Abstract Group activity recognition from videos is a very challenging problem that has barely been addressed. …

View page »

Scene Flow Estimation

Scene Flow Estimation by Growing Correspondence Seeds Jan Cech, Jordi Sanchez-Rieira, and Radu Horaud IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3129-3136, 2011 Abstract  | Code | HAL | IEEEXplore | Bibtex | Video | Papers Software package as a Matlab toolbox (source code of binaries) available from Jan Cech’s website or here. Abstract. A simple seed growing algorithm for estimating …

View page »

Separation of Time-Varying Audio Mixtures

A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures IEEE/ACM Transactions on Audio, Speech and Language Processing D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, R. Horaud Abstract This paper addresses the problem of separating audio sources from time-varying convolutive mixtures. We propose a probabilistic framework based on the local complex-Gaussian model …

View page »

Skeletal Quads

Human Action and Gesture Recognition Using Joint Quadruples Description | Publications | Code G. Evangelidis, G. Singh, R. Horaud Description Recent advances on human motion analysis have made the extraction of human skeleton structure feasible, even from single depth images. This structure has been proven quite informative for discriminating actions in a recognition scenario. In …

View page »

Supervised Sound-Source Localization

Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression Antoine Deleforge, Radu Horaud, Yoav Y. Schechner and Laurent Girin. IEEE/ACM Transactions on Audio, Speech and Language Processing, 23(4), 718-731, April 2015 Abstract | Videos | Dataset | Matlab code | pdf from HAL | IEEE Xplore | Bibtex   Setup: Two microphones plugged into the …

View page »

Three-Dimensional Sensors

Depth Cameras and Associated Computer Vision Methods Radu Horaud (INRIA), Miles Hansard (QMUL), and Georgios Evangelidis (DAQRI)   The emergence of three-dimensional sensors, e.g., Microsoft Kinect v1 and v2, Asus Xtion Pro Live (structered-light sensors), Mesa Imaging SR4000, or Velodyne HDL-64 laser range finder, to cite just a few, have introduced a revolution in the …

View page »

Tracking the Active Speaker Based on Joint Audio-Visual Observation

IEEE International Conference on Computer Vision Workshops, Dec 2015 Israel D. Gebru    Sileye Ba    Georgios Evangelidis    Radu P. Horaud [ PDF ]     [ BibTex ]   [ VIDEO ]   [ DATASET ] Abstract Any multi-party conversation system benefits from speaker diarization, that is, the assignment of speech signals among …

View page »

Video Grounding

From Video Matching to Video Grounding G. Evangelidis, F. Diego, R. Horaud Abstract This paper addresses the background estimation problem for videos captured by moving cameras, referred to as video grounding. It essentially aims at reconstructing a video, as if it would be without foreground objects, e.g. cars or people. What differentiates video grounding from …

View page »