Diffusion-based Unsupervised Audio-visual Speech Enhancement

by Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel, Xavier Alameda-Pineda IEEE International Conference on Audio, Speech, and Signal Processing [ paper ] [ code ] Abstract: —This paper proposes a new unsupervised audiovisual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF)…

Continue reading

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

by Samir Sadok, Simon Leglaive, Laurent Girin, Gaël Richard, Xavier Alameda-Pineda IEEE International Conference on Audio, Speech, and Signal Processing [ paper ] [ code ] Abstract: This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within…

Continue reading

Lost and found: Overcoming detector failures in online multi-object tracking

by Lorenzo Vaquero, Yihong Xu, Xavier Alameda-Pineda, Víctor M Brea, Manuel Mucientes European Conference on Computer Vision [ paper ] [ code ] Abstract: Multi-object tracking (MOT) endeavors to precisely estimate the positions and identities of multiple objects over time. The prevailing approach, tracking-by-detection (TbD), first detects objects and then…

Continue reading

Vq-hps: Human pose and shape estimation in a vector-quantized latent space

by Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer European Conference on Computer Vision [ paper ] [ code ] Abstract: Previous works on Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage…

Continue reading

Navigating the Practical Pitfalls of Reinforcement Learning for Social Robot Navigation

by Dhimiter Pikuli, Jordan Cosio, Xavier Alameda-Pineda, Pierre-Brice Wieber, Thierry Fraichard Robotics: Science and Systems (RSS) Workshop on Unsolved Problems in Social Robot Navigation [ paper ] Navigation is one of the essential tasks in order for robots to be deployed in environments shared with humans. The problem becomes increasingly…

Continue reading

Deep Regression Models and Computer Vision Applications for Multiperson Human-Robot Interaction

PhD defense by Stéphane Lathuilière Tuesday 22nd May 2018, 11:00, Grand Amphithéatre INRIA Grenoble Rhône-Alpes, Montbonnot Saint-Martin Abstract: In order to interact with humans, robots need to perform basic perception tasks such as face detection, human pose estimation or speech recognition. However, in order have a natural interaction with humans,…

Continue reading

January 2015: two accepted papers

Two papers just accepted for publications in IEEE TPAMI and IEEE TASLP: Fusion of Range and Stereo Data for High-Resolution Scene-Modeling Georgios Evangelidis, Miles Hansard, Radu Horaud IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers (IEEE), 2015, pp.14. <http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7031946>. <10.1109/TPAMI.2015.2400465>   Co-Localization of Audio Sources in…

Continue reading