Autoregressive GAN for Semantic Unconditional Head Motion Generation

by Louis Airale, Xavier Alameda-Pineda, Stéphane Lathuilière, and Dominique Vaufreydaz ACM Transactions on Multimedia Tools and Applications [paper][code] Abstract: We address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space. Deviating from talking head generation conditioned on audio that seldom emphasizes realistic…

Continue reading

A Multimodal Dynamical Variational Autoencoder for Audiovisual Speech Representation Learning

by Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, and Renaud Séguier Neural Networks [paper][demo][code] Abstract: In this paper, we present a multimodal \textit{and} dynamical VAE (MDVAE) applied to unsupervised audio-visual speech representation learning. The latent space is structured to dissociate the latent dynamical factors that are shared between the…

Continue reading

Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation

by Xiaoyu Lin, Laurent Girin and Xavier Alameda-Pineda Transactions on Machine Learning Research [paper][code] Abstract: In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources. A DVAE model is pre-trained on a single-source dataset to…

Continue reading

Unsupervised Performance Analysis of 3D Face Alignment with a Statistically Robust Confidence Test

by Mostafa Sadeghi,  Xavier Alameda-Pineda and Radu Horaud Neurocomputing, volume 564, January 2024 [Code & Data] Abstract: We address the problem of analyzing the performance of 3D face alignment (3DFA), or facial landmark localization. Performance analysis is usually based on annotated datasets. Nevertheless, in the particular case of 3DFA, the…

Continue reading

Motion-DVAE: Unsupervised learning for fast human motion denoising

by Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, and Renaud Séguier ACM SIGGRAPH Conference on Motion, Interaction and Games [paper][code] Abstract: Pose and motion priors are crucial for recovering realistic and accurate human motion from noisy observations. Substantial progress has been made on pose and shape estimation from images, and recent…

Continue reading

On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers

by Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier Alameda-Pineda, and Elisa Ricci ICCV 2023 Workshops [paper][code] Abstract: State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting. However, there is a tradeoff between the number of learned parameters and the…

Continue reading

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation

by Louis Airale, Dominique Vaufreydaz, and Xavier Alameda-Pineda [paper][code] Abstract: Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the…

Continue reading

Unsupervised speech enhancement with deep dynamical generative speech and noise models

by Xiaoyu Lin, Simon Leglaive, Laurent Girin, and Xavier Alameda-Pineda Interspeech 2023 [paper][code] Abstract: This work builds on previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF…

Continue reading

Semi-supervised learning made simple with self-supervised clustering

by Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal, Moin Nabi, and Elisa Ricci IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 [paper][code] Abstract: Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations. However, in many real-world scenarios, labels are partially…

Continue reading

Speech Modeling with a Hierarchical Transformer Dynamical VAE

by Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, and Xavier Alameda-Pineda IEEE International Conference on Acoustics, Speech and Signal Processing 2023 [paper][code] Abstract: The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a…

Continue reading