(Closed) MSc. Project: Speech enhancement with deep neural networks

MSc project on “Speech enhancement with deep neural networks

Duration: about 6 months

Short description:

Speech enhancement [1] is an important preprocessing step to various speech information retrieval tasks such as automatic speech recognition. The goal of a speech enhancement method is to provide a clean speech signal from a noisy recording that contains interfering audio sources (other people talking, ambient noise, etc.).

The goal of this project is to develop algorithms based on deep neural networks (DNNs) for speech enhancement. A specific focus will be made on using variational autoencoders (VAEs) [2]. VAEs are used as generative models by learning the probability distribution of the data. They were originally used in computer vision, but they have very recently shown interesting results in speech enhancement [3, 4].

Two kind of approaches may be considered and compared:

  1. “Fully-supervised” methods which assume the knowledge of the possible noise types (traffic noise, nature noise, etc.).
  2. “Semi-supervised” methods which do not rely on this knowledge.

After getting familiar with the literature, the intern will work on developing new methods for speech enhancement based on DNNs and more precisely VAEs.

Keywords: speech enhancement, deep neural networks, speech signal processing.

Information for applicants: Please send your complete CV and a motivation letter to Simon Leglaive (simon.leglaive [at] inria.fr) and Xavier Alameda-Pineda (xavier.alameda-pineda [at] inria.fr).


[1] DeLiang Wang and Jitong Chen, “Supervised speech separation based on deep learning: An overview,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018.

[2] Diederik P. Kingma and Max Welling, “Auto-encoding variational Bayes,” International Conference on Learning Representations (ICLR), 2014.

[3] Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, and Tatsuya Kawahara, “Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization”, IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2018.

[4] Simon Leglaive, Laurent Girin, and Radu Horaud, “A variance modeling framework based on variational autoencoders for speech enhancement”, IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2018.