Vision – Page 3 – RobotLearn

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation

Wen GUO 2021/09/10 2024/03/07Research, Software, Vision

by Wen Guo, Enric Corona, Francesc Moreno-Noguer, Xavier Alameda-Pineda, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2021) [paper][code] Abstract. Recent literature addressed the monocular 3D pose estimation task very satisfactorily. In these studies, different persons are usually treated as independent pose instances to estimate. However, in many everyday situations,…

Robust Face Frontalization For Visual Speech Recognition

Radu HORAUD 2021/08/17 2021/09/03Research, Vision

by Zhiqi Kang, Radu Horaud and Mostafa Sadeghi ICCV’21 Workshop on Traditional Computer Vision in the Age of Deep Learning (TradiCV’21) [paper (extended version)][code][bibtex] Abstract. Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution is a robust method that preserves non-rigid facial deformations, i.e….

TransCenter: Transformers with Dense Representations for Multiple-Object Tracking

Yihong XU 2021/08/04 2024/03/07Research, Software, Vision

by Yihong Xu*, Yutong Ban*, Guillaume Delorme, Chuang Gan, Daniela Rus and Xavier Alameda-Pineda [arXiv] [paper] [code] Abstract: Transformers have proven superior performance for a wide variety of tasks since they were introduced, which has drawn in recent years the attention of the vision community where efforts were made such as…

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

Xavier ALAMEDA-PINEDA 2021/03/30 2024/03/07Research, Software, Sound, Vision

by Mostafa Sadeghi, Xavier Alameda-Pineda IEEE TSP, 2021 [paper] [arXiv] Abstract. In this paper, we are interested in unsupervised (unknown noise) speech enhancement, where the probability distribution of clean speech spectrogram is simulated via a latent variable generative model, also called the decoder. Recently, variational autoencoders (VAEs) have gained much popularity…

Variational Inference and Learning of Piecewise-linear Dynamical Systems

Xavier ALAMEDA-PINEDA 2021/01/30 2021/08/30Research, Vision

by Xavier Alameda-Pineda, Vincent Drouard, Radu Horaud IEEE TNNLS 2021 [PDF] [arXiv] Abstract Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. Baseline methods assume that both the dynamic and observation equations follow linear-Gaussian models. However, there are many real-world processes that cannot be characterized by…

ODANet: Online Deep Appearance Network for Identity-Consistent Multi-Person Tracking

Xavier ALAMEDA-PINEDA 2021/01/25 2021/08/30Research, Sound, Vision

by Guillaume Delorme , Yutong Ban , Guillaume Sarrazin and Xavier Alameda-Pineda ICPR’20 Workshop on Multimodal pattern recognition for social signal processing in human computer interaction [paper] Abstract. The analysis of effective states through time in multi-person scenarii is very challenging, because it requires to consistently track all persons over time. This requires…

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

Xavier ALAMEDA-PINEDA 2020/12/30 2021/08/30Research, Vision

by Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang and Nicu Sebe IEEE TPAMI, 2020 [paper] [arXiv] Abstract. Multi-scale representations deeply learned via convolutional neural networks have shown tremendous importance for various pixel-level prediction problems. In this paper we present a novel approach that advances the state of…