Analysis and development of speech enhancement features in cochlear implants

Speaker: Nicolas Furnon

Date: October 18, 2018 at 10:30 – C005

Abstract:

Language identification systems are very common in speech processing and are used to classify the spoken language given a recorded audio sample. They are often used as a front-end for subsequent processing tasks such as automatic speech recognition or speaker identification. Standard methodologies such as i-vectors or neural networks can achieve satisfying results. Nevertheless these classifiers can lead to very poor performance on operational data, when training and testing data have different distributions.

Typically audio signals may have been recorded in different contexts from different channels (GSM, UHF, VHF, etc) by using different devices. This challenge is known as a domain adaptation problem.

Generative Adversarial Neural Networks (GAN) were introduced recently (2014) in the field of image processing. A generative model has been shown to be able to produce highly realistic samples from numerous image distributions. We aim at using this kind of neural network methodology in a semi-supervised learning framework, transfering the knowledge of a classifier trained on a specific domain to other channels.

Our first experiments on OpenSAD are promising but controlling the convergence of a GAN is not easy. This means understanding the dynamics of their adversarial optimization is crucial to achieve good results and is now part of a current field of research.