Speech signals are highly variable and often degraded by noise or overlapping speech. The output of speech enhancement or source separation methods typically differs from the true “clean” speech signal, and estimation errors must be taken into account in further processing. We also aim to estimate the reliability of phonetic segment boundaries and prosodic parameters for which no such information is currently available.
Uncertainty and acoustic modeling
One objective is to provide more accurate estimates of the posterior distribution of the separated source signals accounting for, e.g., posterior correlations over time and frequency. The estimated uncertainties are then exploited for acoustic modeling in speech recognition.
Uncertainty and phonetic segmentation
The goal here is to investigate the reliability of the automatic phone segment boundaries as such information is critical when dealing with aligned speech-text data, and when computing the duration of the phones in language learning.
Uncertainty and prosody
Here we investigate the performance and reliability of the estimated values of the fundamental frequency, which is one of the prosodic parameters.