The CAVA dataset

The CAVA database is a unique set of audiovisual recordings using binocular and binaural camera/microphone pairs both mounted onto a person’s head. The database was gathered in order to develop computational methods and cognitive models for audiovisual scene analysis, as part of the European project POP (Perception on Purpose, FP6-IST-027268). The CAVA database was recorded in May 2007 by two POP partners: The University of Sheffield and INRIA Grenoble Rhône-Alpes. We recorded a large variety of scenarios representative of typical audiovisual tasks such as tracking a speaker in a complex and dynamic environment: multiple speakers participating to an informal meeting, both static and dynamic speakers, presence of acoustic noise, occluded speakers, speakers’ faces turning away from the cameras, etc.

CAVA website for download:

Please cite to the following publication:

The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements. Elise Arnaud; Heidi Christensen; Yan-Chen Lu; Jon Barker; Vasil Khalidov; Miles Hansard; Bertrand Holveck; Herve Mathieu; Ramya Narasimha; Elise Taillant; Florence Forbes; Radu Horaud. ICMI 2008 – ACM/IEEE International Conference on Multimodal Interfaces, Oct 2008, Chania, Greece, pp. 109-116

icmi08-cava.pdf BibTex