Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

by Mostafa Sadeghi, Xavier Alameda-Pineda IEEE TSP, 2021 [paper] [arXiv] Abstract. In this paper, we are interested in unsupervised (unknown noise) speech enhancement, where the probability distribution of clean speech spectrogram is simulated via a latent variable generative model, also called the decoder. Recently, variational autoencoders (VAEs) have gained much popularity…