Research – Page 5 – RobotLearn

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

Xavier ALAMEDA-PINEDA 2021/03/30 2024/03/07Research, Software, Sound, Vision

by Mostafa Sadeghi, Xavier Alameda-Pineda IEEE TSP, 2021 [paper] [arXiv] Abstract. In this paper, we are interested in unsupervised (unknown noise) speech enhancement, where the probability distribution of clean speech spectrogram is simulated via a latent variable generative model, also called the decoder. Recently, variational autoencoders (VAEs) have gained much popularity…

Variational Inference and Learning of Piecewise-linear Dynamical Systems

Xavier ALAMEDA-PINEDA 2021/01/30 2021/08/30Research, Vision

by Xavier Alameda-Pineda, Vincent Drouard, Radu Horaud IEEE TNNLS 2021 [PDF] [arXiv] Abstract Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. Baseline methods assume that both the dynamic and observation equations follow linear-Gaussian models. However, there are many real-world processes that cannot be characterized by…

ODANet: Online Deep Appearance Network for Identity-Consistent Multi-Person Tracking

Xavier ALAMEDA-PINEDA 2021/01/25 2021/08/30Research, Sound, Vision

by Guillaume Delorme , Yutong Ban , Guillaume Sarrazin and Xavier Alameda-Pineda ICPR’20 Workshop on Multimodal pattern recognition for social signal processing in human computer interaction [paper] Abstract. The analysis of effective states through time in multi-person scenarii is very challenging, because it requires to consistently track all persons over time. This requires…

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

Xavier ALAMEDA-PINEDA 2020/12/30 2021/08/30Research, Vision

by Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang and Nicu Sebe IEEE TPAMI, 2020 [paper] [arXiv] Abstract. Multi-scale representations deeply learned via convolutional neural networks have shown tremendous importance for various pixel-level prediction problems. In this paper we present a novel approach that advances the state of…

Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

Xavier ALAMEDA-PINEDA 2020/10/19 2022/11/12Research

by Mostafa Sadeghi and Xavier Alameda-Pineda Presented at IEEE ICASSP 2021 [arXiv] Abstract: Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined…

Deep Variational Generative Models for Audio-visual Speech Separation

Xavier ALAMEDA-PINEDA 2020/08/04 2022/11/12Research

by Viet-Nhat Nguyen, Mostafa Sadeghi, Elisa Ricci, and Xavier Alameda-Pineda Presented at IEEE MLSP 2021 [arXiv] Abstract: In this paper, we are interested in audio-visual speech separation given a single-channel audio recording as well as visual information (lips movements) associated with each speaker. We propose an unsupervised technique based on…

Online Monaural Speech Enhancement using Delayed Subband LSTM

Xavier ALAMEDA-PINEDA 2020/05/04 2022/04/06Research

by Xiaofei Li and Radu Horaud INTERSPEECH 2020 [arXiv] [speech enhancement examples] Abstract. This paper proposes a delayed subband LSTM network for online monaural (single-channel) speech enhancement. The proposed method is developed in the short time Fourier transform (STFT) domain. Online processing requires frame-by-frame signal reception and processing. A paramount…

CANU-ReID: A Conditional Adversarial Network for Unsupervised person Re-IDentification

Xavier ALAMEDA-PINEDA 2020/04/04 2021/08/04Research

by Guillaume Delorme, Stéphane Lathuilière, Radu Horaud and Xavier Alameda-Pineda Presented at ICPR, 2021 [arXiv] [HAL] [poster] [slides] [code] Abstract: Unsupervised person re-ID is the task of identifying people on a target dataset for which the ID labels are unavailable during training. In this paper, we propose to unify two…

Learning Visual Voice Activity Detection with an Automatically Annotated Dataset

Xavier ALAMEDA-PINEDA 2020/03/31 2021/08/04Research

by Sylvain Guy, Stéphane Lathuilière, Pablo Mesejo and Radu Horaud Presented at ICPR 2021 [paper][bibtex] Abstract. Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. V-VAD is useful whenever audio VAD (A-VAD) is inefficient, either because the acoustic signal is difficult to…

How To Train Your Deep Multi-Object Tracker

Xavier ALAMEDA-PINEDA 2020/03/25 2021/08/04Research

by Yihong Xu, Aljoša Ošep, Yutong Ban, Radu Horaud, Laura Leal-Taixé and Xavier Alameda-Pineda Presented at IEEE CVPR 2020 [arXiv] [paper] [code] Abstract: The recent trend in vision-based multi-object tracking (MOT) is heading towards leveraging the representational power of deep learning to jointly learn to detect and track objects. However,…