Diffusion-based Unsupervised Audio-visual Speech Enhancement

by Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel, Xavier Alameda-Pineda IEEE International Conference on Audio, Speech, and Signal Processing [ paper ] [ code ] Abstract: —This paper proposes a new unsupervised audiovisual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF)…

Continue reading

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

by Samir Sadok, Simon Leglaive, Laurent Girin, Gaël Richard, Xavier Alameda-Pineda IEEE International Conference on Audio, Speech, and Signal Processing [ paper ] [ code ] Abstract: This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within…

Continue reading

Lost and found: Overcoming detector failures in online multi-object tracking

by Lorenzo Vaquero, Yihong Xu, Xavier Alameda-Pineda, Víctor M Brea, Manuel Mucientes European Conference on Computer Vision [ paper ] [ code ] Abstract: Multi-object tracking (MOT) endeavors to precisely estimate the positions and identities of multiple objects over time. The prevailing approach, tracking-by-detection (TbD), first detects objects and then…

Continue reading

Vq-hps: Human pose and shape estimation in a vector-quantized latent space

by Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer European Conference on Computer Vision [ paper ] [ code ] Abstract: Previous works on Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage…

Continue reading

Navigating the Practical Pitfalls of Reinforcement Learning for Social Robot Navigation

by Dhimiter Pikuli, Jordan Cosio, Xavier Alameda-Pineda, Pierre-Brice Wieber, Thierry Fraichard Robotics: Science and Systems (RSS) Workshop on Unsolved Problems in Social Robot Navigation [ paper ] Navigation is one of the essential tasks in order for robots to be deployed in environments shared with humans. The problem becomes increasingly…

Continue reading

A weighted-variance variational autoencoder model for speech enhancement

by Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel [preprint] Abstract: We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the timefrequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in…

Continue reading

Robust audio-visual contrastive learning for proposal-based self-supervised sound source localization in videos

by Hanyu Xuan, Zhiliang Wu, Jian Yang, Bo Jiang, Lei Luo, Xavier Alameda-Pineda, Yan Yan IEEE Transactions on Pattern Analysis and Machine Intelligence Abstract: By observing a scene and listening to corresponding audio cues, humans can easily recognize where the sound is. To achieve such cross-modal perception on machines, existing…

Continue reading

Autoregressive GAN for Semantic Unconditional Head Motion Generation

by Louis Airale, Xavier Alameda-Pineda, Stéphane Lathuilière, and Dominique Vaufreydaz ACM Transactions on Multimedia Tools and Applications [paper][code] Abstract: We address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space. Deviating from talking head generation conditioned on audio that seldom emphasizes realistic…

Continue reading