MSc project on “Robust voice activity detection with deep neural networks”
Duration: about 6 months
Voice activity detection (VAD) is a segmentation problem of a given audio signal into speech and non-speech sections. It constitutes an essential part in many modern speech-based systems such as those for speech and speaker recognition, speech enhancement, emotion recognition and human-computer or human-robot interaction. In many realistic situations, the recorded speech signal is contaminated by interfering noise coming from other audio sources. This noise can strongly deteriorate the performance of the VAD system.
The goal of this project is to develop robust algorithms for VAD based on deep neural networks (DNNs). Due to the sequential nature of the data, a natural choice would be to work with recurrent neural networks (RNNs) such as long short-term memory (LSTM) networks [1, 2].
After getting familiar with the literature, the intern will work on developing new methods for robust VAD based on deep neural networks.
Keywords: voice activity detection, deep neural networks, speech signal processing.
Information for applicants: Please send your complete CV and a motivation letter to Simon Leglaive (simon.leglaive [at] inria.fr). Feel free to contact Simon Leglaive for any further information about the internship.
 Thad Hughes, and Mierle Keir, “Recurrent neural networks for voice activity detection”, IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2013.
 Simon Leglaive, Romain Hennequin, and Roland Badeau, “Singing voice detection with deep recurrent neural networks”, IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2015.