Speaker: Xuechen Liu
Date and place: September 2, 2021 at 10:30, VISIO-CONFERENCE
Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral co- efficient (MFCC) features. While there are alternative feature extraction methods based on things like phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by several trials on optimizing robust frontend methods, towards better speaker verification performance against challenging conditions. We start from comparing 14 different features which has been address in other tasks but not deep learning based speaker verification, then performed some pilot investigations on data-driven methods for various designed feature extractors such as MFCCs, PNCCs, and multi-taper. We hope the works we have done will bring some insights on the importance and potential frontend methods have on modern speaker verification. This is a rough and intuitive description of what I have done so far during my PhD when I was physically in France.