Seminar: LAEO-Net++: Revisiting People Looking at Each Other in Videos

Manuel  J. Marin-Jimenez, University of Cordoba, Spain Thursday, 7  July 2022, 14:00-15:00, room F107, Inria Montbonnot Saint-Martin Attend online: https://inria.webex.com/inria/j.php?MTID=mb256349fcf231701cb7e004536b4f398 Abstract: Capturing the ‘mutual gaze’ of people is essential for understanding and interpreting the social interactions between them. To this end, this paper addresses the problem of detecting people Looking At Each…

Continue reading

Seminar: Machine Learning for Indoor Acoustics

Antoine Deleforge, Multispeech team, Inria Nancy Grand-Est Wednesday, 15 June 2022, 15:30, room F107, Inria Montbonnot Saint-Martin Attend online: https://inria.webex.com/inria/j.php?MTID=m30df5cc25af1cc7f052683154f4f7638 Abstract: Close your eyes, clap your hands. Can you hear the shape of the room? Is there carpet on the floor? Answering these peculiar questions may have applications in acoustic diagnosis,…

Continue reading

The impact of removing head movements on audio-visual speech enhancement

by Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar ICASSP’22, Singapore [paper][examples][code][slides] Abstract. This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today’s learning-based…

Continue reading

Robust Face Frontalization For Visual Speech Recognition

by Zhiqi Kang, Radu Horaud and Mostafa Sadeghi ICCV’21 Workshop on Traditional Computer Vision in the Age of Deep Learning (TradiCV’21) [paper (extended version)][code][bibtex] Abstract. Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution is a robust method that preserves non-rigid facial deformations, i.e….

Continue reading

Fullsubnet: a full-band and sub-band fusion model for real-time single-channel speech enhancement

By Xiang Hao*,#, Xiangdong Su#, Radu Horaud and Xiaofei Li* (*Westlake University, #Inner Mongolia University, China) ICASSP 2021 [arXiv][github][youtube] Abstract. This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy…

Continue reading

Paper published in IEEE Transactions on PAMI

The paper Variotional Bayesian Inference for Audio-Visual Tracking of Multiple Speakers has been published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (journal with one of the highest impact score in the category computational intelligence). This work is part of the Ph.D. thesis of Yutong Ban, now with…

Continue reading