Return to Demos and Videos

Audio-visual multiple-speaker tracking

We exploit the complementarity of audio and visual information for tracking multiple persons and for assigning segments of speech to each person, over time. The tracker is based on a variational Bayesian formulation which yields a computationally tractable solution. Please visit our research page for more details.

Acknowledgments: Work funded by the European Union under the ERC Advanced Grant VHIA.