[Closed] Master Internship on Deep Speaker Recognition

Topic: Identification models have witnessed major improvements with the recent development of deep learning, especially when applied to the visual domain, contributing to the development of face-recognition [1] and person re-identification [2]. However,  comparable performances are yet to be achieved when applied to audio-based speaker recognition. Recent dataset assembling efforts [3] leverage the use of deep learning, and classic training strategies for identification models, obtaining promising results [3-4].

The speaker recognition literature has traditionally been divided in text-dependent and text-independent models: the former assumes that evaluated utterances match in terms of semantic content (‘OK google’ for instance) when the latter do not. The former is recognized to yield better identification performance than the latter [5], suggesting that speech semantic content is not fully handled by text-independent models trained within a classic identification framework. The goal of this internship is to investigate in what extent speech’s content impacts identification by the combined use of speech recognition systems, adversarial strategies and speaker identification models.

Environment: This project will be carried out in the Perception Team, at Inria Grenoble Rhône-Alpes. The research progress will be closely supervised by Guillaume Delorme, Dr. Xavier Alameda-Pineda, and Dr. Radu Horaud, head of the Perception Team. At the perception team we have the necessary computational resources (GPU & CPU) to carry on the proposed research.

[1] Florian Schroff, Dmitry Kalenichenko, James Philbin: FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015
[2] Guillaume Delorme, Xavier Alameda-Pineda, Stephane Lathuilière, Radu Horaud: Camera Adversarial Transfer for Unsupervised Person Re-Identification, 2019
[3] Arsha Nagrani, Joon Son Chung, Andrew Zisserman: VoxCeleb: a large-scale speaker identification dataset, 2017
[4] Joon Son Chung, Arsha Nagrani, Andrew Zisserman : VoxCeleb2: Deep Speaker Recognition, 2018
[5] panelAnthony Larcher, Kong Aik Lee, Bin Ma, Haizhou Li: Text-dependent speaker verification: Classifiers, databases and RSR2015, 2014