The Kinovis Multiple-Speaker Tracking Dataset
Data | pdf from arXiv | download | reference
The Kinovis multiple speaker tracking (Kinovis-MST) datasets contain live acoustic recordings of multiple moving speakers in a reverberant environment. The data were recorded in the Kinovis multiple-camera laboratory at INRIA Grenoble Rhône-Alpes. The room size is 10.2 m × 9.9 m × 5.6 m with T60 = 0.53 s. The data were recorded with four microphones embedded into the head of an NAO robot (please refer to the attached picture). Because there is a fan located inside the robot head nearby the microphones, there is a fair amount of stationary and spatially correlated microphone noise. The SNR of the microphone signals is of approximately 2.7 dB. The recordings contain between one and three moving participants that speak naturally, hence the number of active speech sources varies over time. The robot-to-speaker distance ranges between 1.5 and 3.5 meters. Ground-truth trajectories and speech activity information were obtained in the following way. Participants were wearing optical markers placed on their heads such that the Kinovis’s motion capture system provides accurate 3D trajectories for each participant. Moreover, an infrared marker is placed on each one’s forehead. This enables the identification of each participant over time. Any time a participant is silent, he/she hides his/her infrared marker, thus allowing speaking/silent annotations of the recordings.
Data:
Sequences | Audio recordings | Ground truth | Speaker trajectories |
2PC1 | ground_truth.tsv | ||
3P01 | ground_truth.tsv | ||
3P03 | ground_truth.tsv | ||
3P04 | ground_truth.tsv | ||
3P05 | ground_truth.tsv | ||
3P06 | ground_truth.tsv | ||
3P07 | ground_truth.tsv | ||
3P08 | ground_truth.tsv | ||
3P09 | ground_truth.tsv | ||
3P10 | ground_truth.tsv |
For the microphone specifications, please refer to the v5 version of the NAO robot [link]
Download the Kinovis-MST dataset |
Please cite the following paper:
@article{LiBanGirinAlamedaHoraud2018, author = {Xiaofei Li and Yutong Ban and Laurent Girin and Xavier Alameda{-}Pineda and Radu Horaud}, title = {Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments}, journal = {CoRR}, volume = {abs/1809.10936}, year = {2018}, url = {http://arxiv.org/abs/1809.10936}, archivePrefix = {arXiv}, eprint = {1809.10936}, timestamp = {Fri, 05 Oct 2018 11:34:52 +0200}, biburl = {https://dblp.org/rec/bib/journals/corr/abs-1809-10936}, bibsource = {dblp computer science bibliography, https://dblp.org} }