Speaker: Raphaël Duroselle
Date: July 12, 2018 at 10:30 – C103
Language identification systems are very common in speech processing and are used to classify the spoken language given a recorded audio sample. They are often used as a front-end for subsequent processing tasks such as automatic speech recognition or speaker identification. Standard methodologies such as i-vectors or neural networks can achieve satisfying results. Nevertheless these classifiers can lead to very poor performance on operational data, when training and testing data have different distributions.
Typically audio signals may have been recorded in different contexts from different channels (GSM, UHF, VHF, etc) by using different devices. This challenge is known as a domain adaptation problem.
Generative Adversarial Neural Networks (GAN) were introduced recently (2014) in the field of image processing. A generative model has been shown to be able to produce highly realistic samples from numerous image distributions. We aim at using this kind of neural network methodology in a semi-supervised learning framework, transfering the knowledge of a classifier trained on a specific domain to other channels.
Our first experiments on OpenSAD are promising but controlling the convergence of a GAN is not easy. This means understanding the dynamics of their adversarial optimization is crucial to achieve good results and is now part of a current field of research.
About the speaker:
Raphaël Duroselle integrated École polytechnique in 2014. He majored in applied mathematics and joined the master Mathématiques, Vision, Apprentissage (MVA) at ENS Cachan, where he studied machine learning. In 2018, he completes his master thesis at French Direction Générale de l’Armement (DGA) on the use of adversarial neural networks for speech processing.