Speaker: Karan Nathwani (post-doctoral fellow)
Date: June 15, 2017
Recently, the idea of estimating the uncertainty about the features obtained after speech enhancement and propagating it to dynamically adapt deep neural network (DNN) based acoustic models has raised some interest. However, the results in the literature were reported on simulated noisy datasets for a limited variety of uncertainty estimators. We found that they vary significantly in different conditions. Hence, the main contribution of this work is to assess DNN uncertainty decoding performance for different data conditions and different uncertainty estimation/propagation techniques. In addition, we propose a neural network based uncertainty estimator and compare it with other uncertainty estimators.
However, uncertainty features are used only during decoding. In noise conditions, this results in mismatch between training and decoding phases. Hence in an another work we utilize GMM-derived (GMMD) uncertainty features during DNN-based acoustic model training and decoding. The GMMD features are computed by taking the difference between GMM log-likelihoods obtained with uncertainty and GMM log-likelihoods obtained without uncertainty. These difference features (DF) are then concatenated with enhanced features to form robust input features.
Unlike GMMD features, which are computationally complex, we propose to utilize the Unscented transform (UT) representation of enhanced features. The uncertainty in the enhanced features are modelled as Unscented Transform (UT). During uncertainty decoding, the uncertainty propagation techniques Monte Carlo (MC) sampling and Unscented Transform (UT) propagation techniques have been used for computing the acoustic scores. We have reported DNN uncertainty decoding performance on the CHiME-2 and CHiME-3 datasets for different uncertainty estimation/propagation techniques. The results for each of the contributions have brought improvement in ASR scores compared to baseline techniques.