MSc. Project on Speaker identity modeling with deep learning for re-identification
Short description: Speaker identification is the task that aims at determining which speaker has produced a given utterance . On the other hand, speaker verification or re-identification aims at determining whether there is a match between a given speech utterance and a target speaker identity model . Re-identification becomes difficult in situations where multiple speakers interact with each other [3,4]. In this project, we propose to explore the use of Siamese networks for learning a speaker identity model, which can be then used for a re-identification task in a multi-speaker scenario. The goal is to develop a system that can handle previously unseen speakers entering an on-going recorded conversation. After getting familiar with the literature, the intern will work on developing new methods for modeling speaker identities in the context of this re-identification task.
Information for applicants: Please send your complete CV and a motivation letter to Simon Leglaive (simon.leglaive [at] inria.fr) and Xavier Alameda-Pineda (xavier.alameda-pineda [at] inria.fr). Feel free to ask questions for any further information.
 Yanick Lukic et al. “Speaker identification and clustering using convolutional neural networks.” IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2016.
 Arsha Nagrani et al. “Voxceleb: a large-scale speaker identification dataset.” Interspeech, 2017.
 Y. Ban, X. Alameda-Pineda, L. Girin, and R. Horaud, Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers, 2018.
 Y. Ban, X. Li, X. Alameda-Pineda, L. Girin, and R. Horaud, “Accounting for Room Acoustics in Audio-Visual Multi-Speaker Tracking,” in IEEE ICASSP, 2018.