Variational Inference and Learning of Piecewise-linear Dynamical Systems

by Xavier Alameda-Pineda, Vincent Drouard, Radu Horaud IEEE TNNLS 2021 [PDF] [arXiv] Abstract Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. Baseline methods assume that both the dynamic and observation equations follow linear-Gaussian  models. However, there are many real-world processes that cannot be characterized by…

Continue reading

ODANet: Online Deep Appearance Network for Identity-Consistent Multi-Person Tracking

by Guillaume Delorme , Yutong Ban , Guillaume Sarrazin and Xavier Alameda-Pineda ICPR’20 Workshop on Multimodal pattern recognition for social signal processing in human computer interaction [paper] Abstract. The analysis of effective states through time in multi-person scenarii is very challenging, because it requires to consistently track all persons over time. This requires…

Continue reading

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

by Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang and Nicu Sebe IEEE TPAMI, 2020 [paper] [arXiv] Abstract. Multi-scale representations deeply learned via convolutional neural networks have shown tremendous importance for various pixel-level prediction problems. In this paper we present a novel approach that advances the state of…

Continue reading

Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

by Mostafa Sadeghi and Xavier Alameda-Pineda Presented at IEEE ICASSP 2021 [arXiv] Abstract: Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined…

Continue reading

Online Monaural Speech Enhancement using Delayed Subband LSTM

by Xiaofei Li and Radu Horaud INTERSPEECH 2020 [arXiv] [speech enhancement examples] Abstract.  This paper proposes a delayed subband LSTM network for online monaural (single-channel) speech enhancement. The proposed method is developed in the short time Fourier transform (STFT) domain. Online processing requires frame-by-frame signal reception and processing. A paramount…

Continue reading