Xavier ALAMEDA-PINEDA

MEGA: Masked Generative Autoencoder for Human Mesh Recovery

Xavier ALAMEDA-PINEDA 2025/03/11 2025/04/11Research, Vision

by Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer IEEE International Conference on Computer Vision and Pattern Recognition [ paper ] [ code ] Abstract: Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem, as similar 2D projections can correspond to multiple 3D interpretations. Nevertheless,…

Diffusion-based Unsupervised Audio-visual Speech Enhancement

Xavier ALAMEDA-PINEDA 2025/01/11 2025/04/11Research, Sound, Uncategorized

by Jean-Eudes Ayilo, Mostafa Sadeghi, Romain Serizel, Xavier Alameda-Pineda IEEE International Conference on Audio, Speech, and Signal Processing [ paper ] [ code ] Abstract: —This paper proposes a new unsupervised audiovisual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF)…

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

Xavier ALAMEDA-PINEDA 2025/01/11 2025/04/11Research, Sound, Uncategorized

by Samir Sadok, Simon Leglaive, Laurent Girin, Gaël Richard, Xavier Alameda-Pineda IEEE International Conference on Audio, Speech, and Signal Processing [ paper ] [ code ] Abstract: This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within…

Lost and found: Overcoming detector failures in online multi-object tracking

Xavier ALAMEDA-PINEDA 2024/09/02 2025/04/11Research, Uncategorized, Vision

by Lorenzo Vaquero, Yihong Xu, Xavier Alameda-Pineda, Víctor M Brea, Manuel Mucientes European Conference on Computer Vision [ paper ] [ code ] Abstract: Multi-object tracking (MOT) endeavors to precisely estimate the positions and identities of multiple objects over time. The prevailing approach, tracking-by-detection (TbD), first detects objects and then…

Vq-hps: Human pose and shape estimation in a vector-quantized latent space

Xavier ALAMEDA-PINEDA 2024/09/02 2025/04/11Research, Uncategorized, Vision

by Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer European Conference on Computer Vision [ paper ] [ code ] Abstract: Previous works on Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage…

Navigating the Practical Pitfalls of Reinforcement Learning for Social Robot Navigation

Xavier ALAMEDA-PINEDA 2024/08/03 2025/04/11Reinforcement Learning, Research, Uncategorized

by Dhimiter Pikuli, Jordan Cosio, Xavier Alameda-Pineda, Pierre-Brice Wieber, Thierry Fraichard Robotics: Science and Systems (RSS) Workshop on Unsolved Problems in Social Robot Navigation [ paper ] Navigation is one of the essential tasks in order for robots to be deployed in environments shared with humans. The problem becomes increasingly…

Learning for Companion Robots: Preparation and Adaptation

Xavier ALAMEDA-PINEDA 2024/07/11 2025/04/11Reinforcement Learning, Research, Sound, Uncategorized, Vision

Xavier Alameda-Pineda was a keynote speaker at RFIAP/cAP 2024, on the topic of Learning for Companion Robots: Preparation and Adaptation.

A weighted-variance variational autoencoder model for speech enhancement

Xavier ALAMEDA-PINEDA 2024/07/01 2025/04/11Reinforcement Learning, Research

by Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel [preprint] Abstract: We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the timefrequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in…

Robust audio-visual contrastive learning for proposal-based self-supervised sound source localization in videos

Xavier ALAMEDA-PINEDA 2024/01/11 2025/04/11Reinforcement Learning, Research

by Hanyu Xuan, Zhiliang Wu, Jian Yang, Bo Jiang, Lei Luo, Xavier Alameda-Pineda, Yan Yan IEEE Transactions on Pattern Analysis and Machine Intelligence Abstract: By observing a scene and listening to corresponding audio cues, humans can easily recognize where the sound is. To achieve such cross-modal perception on machines, existing…

Autoregressive GAN for Semantic Unconditional Head Motion Generation

Xavier ALAMEDA-PINEDA 2023/12/13 2024/03/11Research, Software, Vision

by Louis Airale, Xavier Alameda-Pineda, Stéphane Lathuilière, and Dominique Vaufreydaz ACM Transactions on Multimedia Tools and Applications [paper][code] Abstract: We address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space. Deviating from talking head generation conditioned on audio that seldom emphasizes realistic…