Statistical approaches are common for speech processing, and their performance enables their use in actual applications. However, they still have limited capabilities in certain scenarios, e.g., when dealing with degraded (e.g., noisy, reverberated, or overlapped) speech.
Source localization and separation
The focus is set on source localization and separation methods using multiple microphones and/or models of speech and noise. Room acoustics including the modeling of early echoes are being investigated for improved source separation. Acoustic echo suppression is also a concern. Other challenges include getting the most out of deep learning and the new modeling framework based on alpha-stable distributions and combining them with established spatial filtering approaches.
Our main focus is on robust acoustic modeling of speech for speech recognition in reverberant, noisy, distant-microphone conditions, and for robust speaker identification against short speech utterances or spoofing attacks. We also consider acoustic modeling in scenarios when labeled data are scarce. Finally, we study acoustic modeling of ambient noise, which carries rich information about our environment.
Our goal is to deal with the finite size of speech recognition lexicons by predicting possible candidates for out-of-vocabulary (OOV) words according to the estimated context.
This item concerns audio-only voice conversion and parametric speech synthesis. It also includes the introduction of expressivity in the generated speech.