Speaker: Badr Abdullah
Date: May 31, 2018
Automatic Speech Recognition (ASR) systems are usually trained on static data and a finite vocabulary. When a spoken utterance contains Out-Of-Vocabulary (OOV) words, ASR systems misrecognize these words as in-vocabulary words with similar acoustic properties, but with entirely different meaning. The majority of OOV words are information-rich proper nouns (e.g., person names, geographic locations, commercial products) that are vital to spoken content understanding. Therefore, failing to recognise OOV words has a significant adverse impact on many downstream applications such as spoken document indexing and translation.
In this thesis, we address this problem by dynamically extending the ASR vocabulary based on the context obtained from ASR initial first-pass hypothesis. In other words, given the in-vocabulary transcription of a spoken utterance, the goal is to retrieve a ranked list of OOV proper nouns that might be relevant to the context, add words in this list to the ASR lexicon, and perform a second-pass decoding with the ASR system. To this end, we explore different techniques that leverage topical contexts of OOV words in Wikipedia to develop neural models for OOV words retrieval.