MSc. Project on Speaker identity modeling with deep learning for re-identification

MSc. Project on Speaker identity modeling with deep learning for re-identification

Short description: Speaker identification is the task that aims at determining which speaker has produced a given utterance [1]. On the other hand, speaker verification or re-identification aims at determining whether there is a match between a given speech utterance and a target speaker identity model [2]. Re-identification becomes difficult in situations where multiple speakers interact with each other [3,4]. In this project, we propose to explore the use of Siamese networks for learning a speaker identity model, which can be then used for a re-identification task in a multi-speaker scenario. The goal is to develop a system that can handle previously unseen speakers entering an on-going recorded conversation. After getting familiar with the literature, the intern will work on developing new methods for modeling speaker identities in the context of this re-identification task.

Information for applicants: Please send your complete CV and a motivation letter to Simon Leglaive (simon.leglaive [at] inria.fr) and Xavier Alameda-Pineda (xavier.alameda-pineda [at] inria.fr). Feel free to ask questions for any further information.

References:

[1] Yanick Lukic et al. “Speaker identification and clustering using convolutional neural networks.” IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2016.
[2] Arsha Nagrani et al. “Voxceleb: a large-scale speaker identification dataset.” Interspeech, 2017.
[3] Y. Ban, X. Alameda-Pineda, L. Girin, and R. Horaud, Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers, 2018.
[4] Y. Ban, X. Li, X. Alameda-Pineda, L. Girin, and R. Horaud, “Accounting for Room Acoustics in Audio-Visual Multi-Speaker Tracking,” in IEEE ICASSP, 2018.