Author's posts

Master internship on Switching Variational Autoencoders for Audio-visual Speech Separation

Context: Over the past years, variational autoencoders (VAEs) have proven efficient for generative modeling of complicated signals, e.g. speech and audio [1]. Recently, they have successfully been applied to audio-visual speech separation (AVSS) [2], where the goal is to separate a target speech from a mixture of several speech signals, utilizing the visual information of …

Continue reading

Master Internship on Disentanglement of Latent Codes in Dynamical Variational Autoencoders

Context: Deep latent variable models (DLVMs) provide an effective way to model the underlying hidden generative process of natural signals and images [1]. This allows us to approximate the probability density functions of data which in turn can be used for either generating new examples resembling training data or do probabilistic inference and estimation. Variational …

Continue reading

[Closed] Master Internship on face alignment for audio-visual speech enhancement

In many audio-visual applications, e.g., speech enhancement and speech recognition, it is desirable to have aligned images of the mouth region such that a deep neural network can extract reliable visual features. Indeed, the quality of the extracted visual features impacts the performance of audio-visual based applications. In reality, however, a speaker’s face is constantly …

Continue reading