Compact Multiview Representation of Documents Based on the Total Variability Space

SpeakerMohamed Bouallegue (post-doctoral fellow)

Date: April 21, 2016

In this talk, I present my research work during my thesis at Laboratoire Informatique d’Avignon and my postdoctoral research at Laboratoire d’Informatique de l’Université du Maine. This work explores the paradigm of Factor Analysis/i-vector for identification of topics in spoken documents. We identify themes from dialogues of telephone conversation services using multiple topic-spaces estimated with Latent Dirichlet Allocation (LDA) topic models. Estimating several topic models offers different views of the document. Unfortunately, such a multi-model approach also introduces additional variabilities due to the model diversity. We propose to extract the useful information from the full model-set by using an Factor Analysis/i-vector based approach previously developed in the context of speaker recognition. Effectiveness of the approach is shown with experiments conducted on the DECODA corpus that contains records from the call center of the Paris Transportation Company. The features from I-vectors combined with different types of word and semantic features using a Deep Neural Network (DNN) further improve automatic theme identification.