Research – Page 4 – RobotLearn

Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder

Xiaoyu LIN 2022/02/18 2024/03/07Research, Vision

by Xiaoyu Lin, Laurent Girin and Xavier Alameda-Pineda Introduction Multi-object tracking (MOT), or multi-target tracking, is a fundamental and very general pattern recognition task. Given an input time-series, the aim of MOT is to recover the trajectories of an unknown number of sources, that might appear and disappear at any point in time….

The impact of removing head movements on audio-visual speech enhancement

Radu HORAUD 2022/02/01 2022/04/06Research, Sound, Vision

by Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar ICASSP’22, Singapore [paper][examples][code][slides] Abstract. This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today’s learning-based…

Successor Feature Neural Episodic Control

Xavier ALAMEDA-PINEDA 2021/11/19 2024/03/07Reinforcement Learning, Research, Software

by Davier Emukpere, Xavier Alameda-Pineda and Chris Reinke [Paper][code] Abstract. A longstanding goal in reinforcement learning is to build intelligent agents that show fast learning and a flexible transfer of skills akin to humans and animals. This paper investigates the integration of two frameworks for tackling those goals: episodic control and successor features. Episodic…

Dynamical Variational AutoEncoders

Xavier ALAMEDA-PINEDA 2021/10/12 2024/03/07Research, Software, Sound, Vision

by Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, and Xavier Alameda-Pineda Foundations and Trends in Machine Learning, 2021, Vol. 15, No. 1-2, pp 1–175. [Review paper] [Code] [Tutorial @ICASPP 2021] Abstract. Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional…

SocialInteractionGAN: Multi-person Interaction Sequence Generation

Louis AIRALE 2021/09/27 2022/04/04Research, Vision

by Louis Airale, Dominique Vaufreydaz and Xavier Alameda-Pineda [paper] Abstract. Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we model human interaction generation as a discrete multi-sequence generation problem and present SocialInteractionGAN, a novel adversarial architecture for conditional interaction…

A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

Xiaoyu BIE 2021/09/10 2024/03/07Research, Software, Sound

by Xiaoyu Bie, Laurent Girin, Simon Leglaive, Thomas Hueber and Xavier Alameda-Pineda Interspeech’21, Brno, Czech Republic [paper][slides][code][bibtex] Abstract. The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the…

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation

Wen GUO 2021/09/10 2024/03/07Research, Software, Vision

by Wen Guo, Enric Corona, Francesc Moreno-Noguer, Xavier Alameda-Pineda, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2021) [paper][code] Abstract. Recent literature addressed the monocular 3D pose estimation task very satisfactorily. In these studies, different persons are usually treated as independent pose instances to estimate. However, in many everyday situations,…

Robust Face Frontalization For Visual Speech Recognition

Radu HORAUD 2021/08/17 2021/09/03Research, Vision

by Zhiqi Kang, Radu Horaud and Mostafa Sadeghi ICCV’21 Workshop on Traditional Computer Vision in the Age of Deep Learning (TradiCV’21) [paper (extended version)][code][bibtex] Abstract. Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution is a robust method that preserves non-rigid facial deformations, i.e….

TransCenter: Transformers with Dense Representations for Multiple-Object Tracking

Yihong XU 2021/08/04 2024/03/07Research, Software, Vision

by Yihong Xu*, Yutong Ban*, Guillaume Delorme, Chuang Gan, Daniela Rus and Xavier Alameda-Pineda [arXiv] [paper] [code] Abstract: Transformers have proven superior performance for a wide variety of tasks since they were introduced, which has drawn in recent years the attention of the vision community where efforts were made such as…

Fullsubnet: a full-band and sub-band fusion model for real-time single-channel speech enhancement

Radu HORAUD 2021/05/06 2022/04/06Research, Sound

By Xiang Hao*,#, Xiangdong Su#, Radu Horaud and Xiaofei Li* (*Westlake University, #Inner Mongolia University, China) ICASSP 2021 [arXiv][github][youtube] Abstract. This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy…