Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

by Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda and Laurent Girin IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2022. [arXiv][Code] Abstract. Dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of…

Continue reading

Continual Attentive Fusion for Incremental Learning in Semantic Segmentation

Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding,  Hao Tang, Xavier Alameda-Pineda, Elisa Ricci IEEE Transactions on Multimedia [arXiv][HAL] Abstract. Over the past years, semantic segmentation, similar to many other tasks in computer vision, has benefited from the progress in deep neural networks, resulting in significantly improved performance. However, deep architectures trained…

Continue reading

A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos

Hanyu Xuan, Zhiliang Wu, Jian Yang, Yan Yan, Xavier Alameda-Pineda IEEE/CVF International Conference on Computer Vision (CVPR) 2022, New Orleans, US [HAL] Abstract. Humans can easily recognize where and how the sound is produced via watching a scene and listening to corresponding audio cues. To achieve such cross-modal perception on machines, existing methods…

Continue reading

Continual Models are Self-Supervised Learners

by Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022, New Orleans, USA [arXiv][Code][HAL] Abstract. Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale. However,…

Continue reading

Multi Person Extreme Motion Prediction

by Wen Guo*, Xiaoyu Bie*, Xavier Alameda-Pineda and Francesc Moreno-Noguer IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022, New Orleans, USA [paper]  [code] [data] Abstract. Human motion prediction aims to forecast future poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been…

Continue reading

The impact of removing head movements on audio-visual speech enhancement

by Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar ICASSP’22, Singapore [paper][examples][code][slides] Abstract. This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today’s learning-based…

Continue reading

Successor Feature Neural Episodic Control

by Davier Emukpere, Xavier Alameda-Pineda and Chris Reinke [Paper] Abstract. A longstanding goal in reinforcement learning is to build intelligent agents that show fast learning and a flexible transfer of skills akin to humans and animals. This paper investigates the integration of two  frameworks for tackling those goals: episodic control and successor features. Episodic…

Continue reading

ξ-Learning: Successor Feature Transfer Learning for General Reward Functions

by Chris Reinke and Xavier Alameda-Pineda [Paper]                 [Code] Abstract. Transfer in Reinforcement Learning aims to improve learning performance on target tasks using knowledge from experienced source tasks. Successor features (SF) are a prominent transfer mechanism in domains where the reward function changes between…

Continue reading