(Closed) MSc. Project: Deep Learning for Voice Activity Detection

MSc project on “Deep Learning for Voice Activity Detection”

Duration: 6 months

Short description: Voice Activity Detection (VAD) is a technique that classifies a (possibly noisy) audio signal into speech and non-speech segments. It is an essential building block for many speech-based systems, such as speech recognition and spoken dialog for human-computer and human-robot interaction, but also multi-party situated dialog methods. Standard VAD techniques assumes the presence of stationary noise (e.g. fan noise), but this is not realistic in many situations as there may be competing audio sources such as environmental acoustic events which are non-stationary. In this project we plan to investigate methods based on deep learning in order to be able to design a speech/non-speech classifier. The collection, preparation and annotation of training and test data will be carefully addressed. Existing VAD methods (based on deep learning and on more traditional signal processing techniques) will be investigated, tested, and benchmarked.

Keywords: voice activity detection, deep learning, speech signal processing.

Information for applicants: Please send your complete CV to Xiaofei Li (xiaofei.li@inria.fr)