Fullsubnet: a full-band and sub-band fusion model for real-time single-channel speech enhancement

By Xiang Hao*,#, Xiangdong Su#, Radu Horaud and Xiaofei Li* (*Westlake University, #Inner Mongolia University, China) ICASSP 2021 [arXiv][github][youtube] Abstract. This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy…

Continue reading

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

by Mostafa Sadeghi, Xavier Alameda-Pineda IEEE TSP, 2021 [paper] [arXiv] Abstract. In this paper, we are interested in unsupervised (unknown noise) speech enhancement, where the probability distribution of clean speech spectrogram is simulated via a latent variable generative model, also called the decoder. Recently, variational autoencoders (VAEs) have gained much popularity…

Continue reading

Variational Inference and Learning of Piecewise-linear Dynamical Systems

by Xavier Alameda-Pineda, Vincent Drouard, Radu Horaud IEEE TNNLS 2021 [PDF] [arXiv] Abstract Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. Baseline methods assume that both the dynamic and observation equations follow linear-Gaussian  models. However, there are many real-world processes that cannot be characterized by…

Continue reading

ODANet: Online Deep Appearance Network for Identity-Consistent Multi-Person Tracking

by Guillaume Delorme , Yutong Ban , Guillaume Sarrazin and Xavier Alameda-Pineda ICPR’20 Workshop on Multimodal pattern recognition for social signal processing in human computer interaction [paper] Abstract. The analysis of effective states through time in multi-person scenarii is very challenging, because it requires to consistently track all persons over time. This requires…

Continue reading

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

by Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang and Nicu Sebe IEEE TPAMI, 2020 [paper] [arXiv] Abstract. Multi-scale representations deeply learned via convolutional neural networks have shown tremendous importance for various pixel-level prediction problems. In this paper we present a novel approach that advances the state of…

Continue reading

Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

by Mostafa Sadeghi and Xavier Alameda-Pineda Presented at IEEE ICASSP 2021 [arXiv] Abstract: Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined…

Continue reading

Online Monaural Speech Enhancement using Delayed Subband LSTM

by Xiaofei Li and Radu Horaud INTERSPEECH 2020 [arXiv] [speech enhancement examples] Abstract.  This paper proposes a delayed subband LSTM network for online monaural (single-channel) speech enhancement. The proposed method is developed in the short time Fourier transform (STFT) domain. Online processing requires frame-by-frame signal reception and processing. A paramount…

Continue reading