Tracking Multiple Audio Sources with the von Mises Distribution and Variational Expectation Maximization
Yutong Ban, Xavier Alameda-Pineda, Christine Evers (Imperial College) and Radu Horaud
IEEE Signal Processing Letters 26(6), 798 – 802, 2019 | pdf | pdf of supplemental material | code | video
Abstract. In this work we address the problem of simultaneously tracking several audio sources, namely the problem of estimating source trajectories from a sequence of observed features. We propose to use the von Mises distribution to model audio-source directions of arrival (DOAs) with circular random variables. This leads to a multi-target Kalman filter formulation which is intractable because of the combinatorial explosion of associating observations to state variables over time. We propose a variational approximation of the filter’s posterior distribution and we infer a variational expectation-maximization (VEM) algorithm which is computationally efficient. We also propose an audio-source birth method that favors smooth source trajectories and which is used both to initialize the number of active sources and to detect new sources. We perform experiments with a recently released dataset comprising several moving sources as well as a moving microphone array.
The video below shows an example of tracking two moving speakers from the LOCATA corpus (task #6, recording #1). The directions of arrival (DOAs) are modeled as circular random variables drawn from the von Mises distribution.
Tracking results on LOCATA Dataset
Acknowledgments: This work is funded by the European Union ERC Advanced Grant VHIA #340113 (Y. Ban, X. Alameda-Pineda, and R. Horaud) and by the UK EPSRC Fellowship grant EP/P001017/1 (C. Evers).