The Kinovis-MST Dataset

The Kinovis Multiple-Speaker Tracking Dataset

Data | pdf from arXiv | download | reference

The Kinovis multiple speaker tracking (Kinovis-MST) datasets contain live acoustic recordings of multiple moving speakers in a reverberant environment. The data were recorded in the Kinovis multiple-camera laboratory at INRIA Grenoble Rhône-Alpes. The room size is 10.2 m × 9.9 m × 5.6 m with T60 = 0.53 s. The data were recorded with four microphones embedded into the head of an NAO robot (please refer to the attached picture). Because there is a fan located inside the robot head nearby the microphones, there is a fair amount of stationary and spatially correlated microphone noise. The SNR of the microphone signals is of approximately 2.7 dB. The recordings contain between one and three moving participants that speak naturally, hence the number of active speech sources varies over time. The robot-to-speaker distance ranges between 1.5 and 3.5 meters. Ground-truth trajectories and speech activity information were obtained in the following way. Participants were wearing optical markers placed on their heads such that the Kinovis’s motion capture system provides accurate 3D trajectories for each participant. Moreover, an infrared marker is placed on each one’s forehead. This enables the identification of each participant over time. Any time a participant is silent, he/she hides his/her infrared marker, thus allowing speaking/silent annotations of the recordings.

Data:

Sequences	Audio recordings	Ground truth	Speaker trajectories
2PC1		ground_truth.tsv
3P01		ground_truth.tsv
3P03		ground_truth.tsv
3P04		ground_truth.tsv
3P05		ground_truth.tsv
3P06		ground_truth.tsv
3P07		ground_truth.tsv
3P08		ground_truth.tsv
3P09		ground_truth.tsv
3P10		ground_truth.tsv

For the microphone specifications, please refer to the v5 version of the NAO robot [link]

Download the Kinovis-MST dataset

Please cite the following paper:

@article{LiBanGirinAlamedaHoraud2018,
  author    = {Xiaofei Li and
               Yutong Ban and
               Laurent Girin and
               Xavier Alameda{-}Pineda and
               Radu Horaud},
  title     = {Online Localization and Tracking of Multiple Moving Speakers in Reverberant
               Environments},
  journal   = {CoRR},
  volume    = {abs/1809.10936},
  year      = {2018},
  url       = {http://arxiv.org/abs/1809.10936},
  archivePrefix = {arXiv},
  eprint    = {1809.10936},
  timestamp = {Fri, 05 Oct 2018 11:34:52 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1809-10936},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

The Kinovis-MST Dataset

The Kinovis Multiple-Speaker Tracking Dataset

Data | pdf from arXiv | download | reference

Radu HORAUD