Feb 05

Post Doctoral Position (12 months). Natural Language Processing: automatic speech recognition system using deep neural networks without out-of-vocabulary words

Location: INRIA Nancy Grand Est research center, France


Project-team: Multispeech

Deadline to apply: June 6, 2018

Scientific Context:

More and more audio/video appear on Internet each day. About 300 hours of multimedia are uploaded per minute. In these multimedia sources, audio data represents a very important part. If these documents are not transcribed, automatic content retrieval is difficult or impossible. The classical approach for spoken content retrieval from audio documents is an automatic speech recognition followed by text retrieval.

An automatic speech recognition system (ASR) uses a lexicon containing the most frequent words of the language and only the words of the lexicon can be recognized by the system. New Proper Names (PNs) appear constantly, requiring dynamic updates of the lexicons used by the ASR. These PNs evolve over time and no vocabulary will ever contains all existing PNs. When a person searches for a document, proper names are used in the query. If these PNs have not been recognized, the document cannot be found. These missing PNs can be very important for the understanding of the document.

In this study, we will focus on the problem of proper names in automatic recognition systems. The problem is how to model relevant proper names for the audio document we want to transcribe.

– Missions:

We assume that in an audio document to transcribe we have missing proper names, i.e. proper names that are pronounced in the audio document but that are not in the lexicon of the automatic speech recognition system; these proper names cannot be recognized (out-of-vocabulary proper names, OOV PNs). The purpose of this work is to design a methodology how to find and model a list of relevant OOV PNs that correspond to an audio document.

Assuming that we have an approximate transcription of the audio document and huge text corpus extracted from internet, several methodologies could be studied:

  • From the approximate OOV pronunciation in the transcription, generate the possible writings of the word (phoneme to character conversion) and search this word in the text corpus.

  • A deep neural network can be designed to predict OOV proper names and their pronunciations with the training objective to maximize the retrieval of relevant OOV proper names.

The proposed approaches will be validated using the ASR developed in our team.

Keywords: deep neural networks, automatic speech recognition, lexicon, out-of-vocabulary words.


[Mikolov2013] Mikolov, T., Chen, K., Corrado, G. and Dean, J. “Efficient estimation of word representations in vector space”, Workshop at ICLR, 2013.

[Deng2013] Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y. and Acero A. “Recent advances in deep learning for speech research at Microsoft”, Proceedings of ICASSP, 2013.

[Sheikh2016] Sheihk, I., Illina, I., Fohr, D., Linarès, G. “Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition”. Interspeech, 2016.

[Li2017] J. Li, G. Ye, R. Zhao, J. Droppo, Y. Gong , “Acoustic-to-Word Model without OOV”, ASRU, 2017.

Skills and profile: PhD in computer science, background in statistics, natural language processing, experience with deep learning tools (keras, kaldi, etc.) and computer program skills (Perl, Python).

Additional information:

Supervision and contact: Irina Illina, LORIA/INRIA (illina@loria.fr), Dominique Fohr INRIA/LORIA (dominique.fohr@loria.fr) https://members.loria.fr/IIllina/, https://members.loria.fr/DFohr/

Additional links : Ecole Doctorale IAEM Lorraine

Deadline to apply: June 6th

Selection results: end of June

Duration :12 of months.

Starting date: between Nov. 1st 2018 and Jan. 1st 2019
Salary: about 2.115 euros net, medical insurance included

The candidates must have defended their PhD later than Sept. 1st 2016 and before the end of 2018.

The candidates are required to provide the following documents in a single pdf or ZIP file:

  • CV including a description of your research activities (2 pages max) and a short description of what you consider to be your best contributions and why (1 page max and 3 contributions max); the contributions could be theoretical or  practical. Web links to the contributions should be provided. Include also a brief description of your scientific and career projects, and your scientific positioning regarding the proposed subject.

  • The report(s) from your PhD external reviewer(s), if applicable.

  • If you haven’t defended yet, the list of expected members of your PhD committee (if known) and the expected date of defence.

In addition, at least one recommendation letter from the PhD advisor should be sent directly by their author(s) to the prospective postdoc advisor.

Help and benefits:

  • Possibility of free French courses

  • Help for finding housing

  • Help for the resident card procedure and for husband/wife visa