Return to Research

Audio-Visual Tracking by Density Approximation

Audio-Visual Tracking by Density Approximation in a Sequential Bayesian  Filtering Framework

Israel D. Gebru   Christine Evers* Patrick A. Naylor*    Radu P. Horaud

IEEE Workshop on Hands-free Speech Communication and Microphone Arrays

Best Paper Award
*Imperial College London

Abstract

This paper proposes a novel audio-visual tracking approach that exploits constructively audio and visual modalities in order to estimate trajectories of multiple people in a joint state space. The tracking problem is modeled using a sequential Bayesian filtering framework. Within this framework, we propose to represent the posterior density with a Gaussian Mixture Model (GMM). To ensure that a GMM representation can be retained sequentially over time, the predictive density is approximated by a GMM using the Unscented Transform. While a density interpolation technique is introduced to obtain a continuous representation of the observation likelihood, which is also a GMM. Furthermore, to prevent the number of mixtures from growing exponentially over time, a density approximation based on the Expectation Maximization (EM) algorithm is applied, resulting in a compact GMM representation of the posterior density. Recordings using a camcorder and microphone array are used to evaluate the proposed approach, demonstrating significant improvements in tracking performance of the proposed audio-visual approach compared to two benchmark visual trackers.

BibTeX


@inproceedings{gebru2017audio,
  title={Audio-visual Tracking by Density Approximation in a Sequential Bayesian Filtering Framework},
  author={Gebru, Israel and Evers, Christine and Naylor, Patrick and Horaud, Radu},
  booktitle={IEEE Workshop on Hands-free Speech Communication and Microphone Arrays},
  month = {March},
  year={2017},
  address = {San Francisco, CA}
}

Code

coming soon!

Video

coming soon!