Modelling Context of OOV Words in Large Vocabulary Continuous Speech Recognition

SpeakerImran Sheikh (PhD student)

Date: June 16, 2016


The diachronic nature of broadcast news content causes frequent variations in the linguistic content and vocabulary, leading to Out-Of-Vocabulary (OOV) words and specially OOV proper names. OOVs missed by the speech recognition system can be recovered by a dynamic vocabulary multi-pass recognition approach in which relevant proper names are retrieved by exploiting the semantic and topical context of the spoken content. This talk will discuss our exploration with probabilistic topic models and word embeddings from neural network models for the task of retrieval of relevant proper names. We will present our adaptation of the Neural Bag-of-Words (NBOW) model to learn word and context vector representations which outperform the generic representations in our task. Our Neural Bag-of-Weighted-Words (NBOW2) model learns to assign degree of importance to input words and has the ability to capture (task specific) key-words. With experiments on speech recognition on French broadcast news and some standard text classification tasks we will show the effectiveness of our proposed model.