Vision – Page 2 – RobotLearn

Back to MLP: A Simple Baseline for Human Motion Prediction

Wen GUO 2023/04/07 2024/03/11Research, Software, Vision

by Wen Guo*, Yuming Du*, Xi Shen, Vincent Lepetit, Xavier Alameda-Pineda, and Francesc Moreno-Noguer IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023, Waikoloa, Hawaii [paper] [code] [HAL] Abstract. This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. State-of-the-art…

Expression-preserving face frontalization improves visually assisted speech processing

Radu HORAUD 2022/12/16 2024/03/11Research, Sound, Vision

by Zhiqi Kang, Mostafa Sadeghi, Radu Horaud and Xavier Alameda-Pineda International Journal of Computer Vision, 2023, 131 (5), pp.1122-1140 [arXiv] [HAL] [webpage] Abstract. Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations in order to boost…

DAUMOT: Domain Adaptation for Unsupervised Multiple Object Tracking

Yihong XU 2022/05/31 2024/03/07Research, Software, Vision

By Guillaume Delorme*, Yihong Xu*, Luis G. Camara, Elisa Ricci, Radu Horaud, Xavier Alameda Pineda [arXiv] [paper] [code] Abstract: Existing works on multiple object tracking (MOT) are developed under the traditional supervised learning setting, where the training and test data are drawn from the same distribution. This hinders the development…

Continual Attentive Fusion for Incremental Learning in Semantic Segmentation

Xavier ALAMEDA-PINEDA 2022/04/28 2022/04/28Research, Vision

Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Hao Tang, Xavier Alameda-Pineda, Elisa Ricci IEEE Transactions on Multimedia [arXiv][HAL] Abstract. Over the past years, semantic segmentation, similar to many other tasks in computer vision, has benefited from the progress in deep neural networks, resulting in significantly improved performance. However, deep architectures trained…

A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos

Xavier ALAMEDA-PINEDA 2022/04/28 2022/04/28Research, Vision

Hanyu Xuan, Zhiliang Wu, Jian Yang, Yan Yan, Xavier Alameda-Pineda IEEE/CVF International Conference on Computer Vision (CVPR) 2022, New Orleans, US [HAL] Abstract. Humans can easily recognize where and how the sound is produced via watching a scene and listening to corresponding audio cues. To achieve such cross-modal perception on machines, existing methods…

Continual Models are Self-Supervised Learners

Xavier ALAMEDA-PINEDA 2022/04/28 2024/03/07Research, Software, Vision

by Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022, New Orleans, USA [arXiv][Code][HAL] Abstract. Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale. However,…

Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder

Xiaoyu LIN 2022/02/18 2024/03/07Research, Vision

by Xiaoyu Lin, Laurent Girin and Xavier Alameda-Pineda Introduction Multi-object tracking (MOT), or multi-target tracking, is a fundamental and very general pattern recognition task. Given an input time-series, the aim of MOT is to recover the trajectories of an unknown number of sources, that might appear and disappear at any point in time….

The impact of removing head movements on audio-visual speech enhancement

Radu HORAUD 2022/02/01 2022/04/06Research, Sound, Vision

by Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar ICASSP’22, Singapore [paper][examples][code][slides] Abstract. This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today’s learning-based…

Dynamical Variational AutoEncoders

Xavier ALAMEDA-PINEDA 2021/10/12 2024/03/07Research, Software, Sound, Vision

by Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, and Xavier Alameda-Pineda Foundations and Trends in Machine Learning, 2021, Vol. 15, No. 1-2, pp 1–175. [Review paper] [Code] [Tutorial @ICASPP 2021] Abstract. Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional…

SocialInteractionGAN: Multi-person Interaction Sequence Generation

Louis AIRALE 2021/09/27 2022/04/04Research, Vision

by Louis Airale, Dominique Vaufreydaz and Xavier Alameda-Pineda [paper] Abstract. Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we model human interaction generation as a discrete multi-sequence generation problem and present SocialInteractionGAN, a novel adversarial architecture for conditional interaction…