Regularization of the embedding extractor for robust language identification

Speaker: Raphaël Duroselle

Date and place: September 17, 2020 at 10:30 -VISIO-CONFERENCE

Abstract:

Language identification systems achieve impressive performance in matched conditions, when the training data corresponds to the testing conditions. However, in the presence of an important domain shift, performance drops drastically. The main focus of this work is to address this issue to make a language identification system robust to a change of the transmission channel (telephone, VHF, radio, television).

State-of-the-art language identification systems are based on embeddings extracted from a discriminatively trained neural network. We modify the loss function of this neural network in order to make embeddings invariant to domains. This effect can be achieved by adding a regularization term to the classification loss of the embeddings extractor. We demonstrate the effectiveness of two regularization losses: Maximum Mean Discrepancy and N-pair Loss (metric learning). We evaluate these methods for two scenarios: unsupervised domain adaptation where we only have access to unlabeled recordings of the target domain, and multi-domain training with labeled data from several domains. Finally we evaluate the impact of different training losses on the structure of the embedding space by measuring the proximity between groups of embeddings corresponding to different languages or domains.