Audio-Visual Tracking by Density Approximation in a Sequential Bayesian Filtering Framework
Israel D. Gebru Christine Evers* Patrick A. Naylor* Radu P. Horaud
IEEE Workshop on Hands-free Speech Communication and Microphone Arrays
Best Paper Award
*Imperial College London
Abstract
This paper proposes a novel audio-visual tracking approach that exploits constructively audio and visual modalities in order to estimate trajectories of multiple people in a joint state space. The tracking problem is modeled using a sequential Bayesian filtering framework. Within this framework, we propose to represent the posterior density with a Gaussian Mixture Model (GMM). To ensure that a GMM representation can be retained sequentially over time, the predictive density is approximated by a GMM using the Unscented Transform. While a density interpolation technique is introduced to obtain a continuous representation of the observation likelihood, which is also a GMM. Furthermore, to prevent the number of mixtures from growing exponentially over time, a density approximation based on the Expectation Maximization (EM) algorithm is applied, resulting in a compact GMM representation of the posterior density. Recordings using a camcorder and microphone array are used to evaluate the proposed approach, demonstrating significant improvements in tracking performance of the proposed audio-visual approach compared to two benchmark visual trackers.
BibTeX
@inproceedings{gebru2017audio, title={Audio-visual Tracking by Density Approximation in a Sequential Bayesian Filtering Framework}, author={Gebru, Israel and Evers, Christine and Naylor, Patrick and Horaud, Radu}, booktitle={IEEE Workshop on Hands-free Speech Communication and Microphone Arrays}, month = {March}, year={2017}, address = {San Francisco, CA} }
Code
coming soon!
Video
coming soon!