PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation

by Wen Guo, Enric Corona, Francesc Moreno-Noguer, Xavier Alameda-Pineda, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2021) [paper][code] Abstract. Recent literature addressed the monocular 3D pose estimation task very satisfactorily. In these studies, different persons are usually treated as independent pose instances to estimate. However, in many everyday situations,…

Continue reading

Robust Face Frontalization For Visual Speech Recognition

by Zhiqi Kang, Radu Horaud and Mostafa Sadeghi ICCV’21 Workshop on Traditional Computer Vision in the Age of Deep Learning (TradiCV’21) [paper (extended version)][code][bibtex] Abstract. Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution is a robust method that preserves non-rigid facial deformations, i.e….

Continue reading

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

by Mostafa Sadeghi, Xavier Alameda-Pineda IEEE TSP, 2021 [paper] [arXiv] Abstract. In this paper, we are interested in unsupervised (unknown noise) speech enhancement, where the probability distribution of clean speech spectrogram is simulated via a latent variable generative model, also called the decoder. Recently, variational autoencoders (VAEs) have gained much popularity…

Continue reading

Variational Inference and Learning of Piecewise-linear Dynamical Systems

by Xavier Alameda-Pineda, Vincent Drouard, Radu Horaud IEEE TNNLS 2021 [PDF] [arXiv] Abstract Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. Baseline methods assume that both the dynamic and observation equations follow linear-Gaussian  models. However, there are many real-world processes that cannot be characterized by…

Continue reading

ODANet: Online Deep Appearance Network for Identity-Consistent Multi-Person Tracking

by Guillaume Delorme , Yutong Ban , Guillaume Sarrazin and Xavier Alameda-Pineda ICPR’20 Workshop on Multimodal pattern recognition for social signal processing in human computer interaction [paper] Abstract. The analysis of effective states through time in multi-person scenarii is very challenging, because it requires to consistently track all persons over time. This requires…

Continue reading

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

by Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang and Nicu Sebe IEEE TPAMI, 2020 [paper] [arXiv] Abstract. Multi-scale representations deeply learned via convolutional neural networks have shown tremendous importance for various pixel-level prediction problems. In this paper we present a novel approach that advances the state of…

Continue reading