Datasets

CHiME-3


The CHiME-3 dataset consists of Wall Street Journal utterances read by 12 US English talkers and recorded by a 6-microphone tablet device in 4 varied noise settings: café, street junction, public transport and pedestrian area. It also contains simulated noisy utterances. All data has been fully transcribed. It was used for the CHiME-3 and CHiME-4 challenges.

CHiME-5


CHiME-5 targets the problem of distant microphone conversational speech recognition in everyday home environments. Speech material has been collected from 20 real dinner parties that have taken place in real homes. The parties have been made using multiple 4-channel microphone arrays and have been fully transcribed.

DCASE 2018 – TASK4

DCASE 2018 – TASK4 evaluates systems for the large-scale detection of sound events using weakly labeled data. The challenge is to explore the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly annotated training set to improve system performance.

DCASE 2019 – TASK4

DCASE 2019 – TASK4 evaluates systems for the large-scale detection of sound events using real data either weakly labeled or unlabeled and simulated data that is strongly labeled (with time stamps). The scientific question this task is aiming to investigate is whether we really need real but partially and weakly annotated data or is using synthetic data sufficient? or do we need both?

DREGON

The DREGON (DRone EGonoise and localizatiON) dataset consists in sounds recorded with an 8-channel microphone array embedded into a quadrotor UAV (Unmanned Aerial Vehicle) annotated with the precise 3D position of the sound source relative to the drone as well as other sensor measurements. It aims at promoting research in UAV-embedded sound source localization for search-and-rescue and was used for the IEEE Signal Processing Cup 2019.

VAST

The VAST (Virtual Acoustic Space Traveling) project gathers large datasets of simulated room impulse responses annotated with acoustical and geometrical properties of the corresponding rooms, sources and microphones. The aim is to investigate the generalizability of source propagation models learned on simulated datasets to real-world data.