Time-frequency masking and Optimal Wiener filter for multichannel speech enhancement

Speaker: Ziteng Wang

Date: December 8, 2016


Time-frequency speech presence probability estimation or mask estimation is crucial in speech enhancement. It is especially the case in Multichannel Wiener Filter (MWF), of which the solution only relies on the second-order statistics of speech and noise. For the estimation methods, there has been a shift from experimental thresholding on multichannel features to the Deep Neural Network (DNN) based ones. We introduce here one adaptive thresholding approach with Beam-to-Reference Ratio (BRR) feature and one more recent approach based on BLSTM. Combining them with a parametric MWF, we present some recognition results on the CHiME-4 dataset.