Speaker: Ziteng Wang
Date: December 8, 2016
Time-frequency speech presence probability estimation or mask estimation is crucial in speech enhancement. It is especially the case in Multichannel Wiener Filter (MWF), of which the solution only relies on the second-order statistics of speech and noise. For the estimation methods, there has been a shift from experimental thresholding on multichannel features to the Deep Neural Network (DNN) based ones. We introduce here one adaptive thresholding approach with Beam-to-Reference Ratio (BRR) feature and one more recent approach based on BLSTM. Combining them with a parametric MWF, we present some recognition results on the CHiME-4 dataset.