[Closed] Master internship on Switching Variational Autoencoders for Audio-visual Speech Separation

Context: Over the past years, variational autoencoders (VAEs) have proven efficient for generative modeling of complicated signals, e.g. speech and audio [1]. Recently, they have successfully been applied to audio-visual speech separation (AVSS) [2], where the goal is to separate a target speech from a mixture of several speech signals,…

Continue reading

[Closed] Master Internship on face alignment for audio-visual speech enhancement

In many audio-visual applications, e.g., speech enhancement and speech recognition, it is desirable to have aligned images of the mouth region such that a deep neural network can extract reliable visual features. Indeed, the quality of the extracted visual features impacts the performance of audio-visual based applications. In reality, however,…

Continue reading