This video summarizes some of the work carried out by the Perception team in 2018. The video shows multiple person tracking, audio-source localization, audiovisual alignment, speaker diarization, as well as a complete pipeline, including the assignment of segments of speech to persons, and speech recognition.
Acknowledgments: Work funded by the European Union under the ERC Advanced Grant VHIA and ERC Proog of Concept VHIALab.