The Kinovis-MST Dataset

The Kinovis Multiple-Speaker Tracking Dataset

Data | pdf from arXiv | download | reference


The Kinovis multiple speaker tracking (Kinovis-MST) datasets contain live acoustic recordings of multiple moving speakers in a reverberant environment. The data were recorded in the Kinovis multiple-camera laboratory at INRIA Grenoble Rhône-Alpes.  The room size is 10.2 m × 9.9  m × 5.6 m with T60 = 0.53 s. The data were recorded with four microphones embedded into the head of an NAO robot (please refer to the attached picture).  Because there is a fan located inside the robot head nearby the microphones, there is a fair amount of stationary and spatially correlated microphone noise. The SNR of the microphone signals is of approximately 2.7 dB.  The recordings contain between one and three moving participants that speak naturally, hence the number of active speech sources varies over time. The robot-to-speaker distance ranges between 1.5 and 3.5 meters. Ground-truth trajectories and speech activity information were obtained in the following way. Participants were wearing optical markers placed on their heads such that the Kinovis’s motion capture system provides accurate 3D trajectories for each participant. Moreover, an infrared marker is placed on each one’s forehead. This enables the identification of each participant over time. Any time a participant is silent, he/she hides his/her infrared marker, thus allowing speaking/silent annotations of the recordings. 

 

Data:

Sequences Audio recordings Ground truth  Speaker trajectories
2PC1  

ground_truth.tsv
3P01  

ground_truth.tsv
3P03  

ground_truth.tsv
3P04  

ground_truth.tsv
3P05  

ground_truth.tsv
3P06  

ground_truth.tsv
3P07  

ground_truth.tsv
3P08  

ground_truth.tsv
3P09  

ground_truth.tsv
3P10  

ground_truth.tsv

 


For the microphone specifications, please refer to the v5 version of the NAO robot [link]


Download the Kinovis-MST dataset

 

Please cite the following paper:

@article{LiBanGirinAlamedaHoraud2018,
  author    = {Xiaofei Li and
               Yutong Ban and
               Laurent Girin and
               Xavier Alameda{-}Pineda and
               Radu Horaud},
  title     = {Online Localization and Tracking of Multiple Moving Speakers in Reverberant
               Environments},
  journal   = {CoRR},
  volume    = {abs/1809.10936},
  year      = {2018},
  url       = {http://arxiv.org/abs/1809.10936},
  archivePrefix = {arXiv},
  eprint    = {1809.10936},
  timestamp = {Fri, 05 Oct 2018 11:34:52 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1809-10936},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}