Virtual Acoustic Space Learning for Auditory Scene Geometry Estimation

SpeakerAntoine Deleforge (Researcher, INRIA Rennes)

Date: September 7, 2017


Most auditory scene analysis methods (source separation, denoising, dereverberation, etc.) rely on some geometrical information about the system: Where are the sources? Where are the microphones? What is around or between them? Since the geometrical configuration of real-world systems is often very complex, classical approaches rely on an approximate physical model of the system. For instance, a unique direct propagation path is assumed from each source to each microphone, or the wave-propagation equation is solved within a known enclosure. While these methods work well in idealized conditions, they suffer in complex, unknown environments. Alternatively, “data-driven” approaches use a training dataset to learn an implicit mapping from high-dimensional acoustic features to low-dimensional physical properties. They work well in arbitrarily complex environments, but require carefully annotated real recordings and are hence often system-specific.

In this talk, I will first review some physics-driven and data-driven methods, and then introduce a third approach referred to as “Virtual Acoustic Space Learning”, aiming at taking the best of both worlds. The idea is to build a massive dataset containing arbitrarily many auditory scene configurations thanks to a physics-based room acoustic simulator. A model implicitly learned from this dataset can then be used to perform advanced geometry estimation tasks on real data from unknown environments. Both machine learning and audio signal processing methodologies will be detailed, and preliminary results demonstrating the feasibility of the approach will be presented.