Speaker: Aman Zaid Berhe
Date: June 29, 2017
Deep neural networks are now the state-of-the-art in acoustic modeling for automatic speech recognition. The allow obtaining robust and high accuracy acoustic models. However, these models have a lot of hyper-parameters. Hyper-parameters optimization is very tedious yet essential tasks to successfully train very deep neural networks. We proposed to optimize theses parameters automatically for different architectures such as long short term memory (LSTM), wide residual network combined with LSTM and highway network combined with LSTM that recently allowed for obtaining state-of-the-art results on various automatic speech recognition tasks.
Experiments are conducted on a subset of the ESTER, a French corpus for automatic speech recognition. Automatic hyper-parameter optimization allows the exploitation of several architectures resulting in a large performance improvement, from 56% frame accuracy with the previous baseline (a multi layer perceptron implemented in Kaldi) to about 85.5% with LSTM-based architecture