Complex-valued and hybrid models for audio processing

Speaker: Paul Magron

Date and place: January 14, 2021 at 10:30, VISIO-CONFERENCE

Abstract:

In this talk, I will give an overview of my work, which main application is sound source separation, the task of automatically extracting constitutive components from their observed mixture in an audio recording. I will address it in the time-frequency domain, which reveals the underlying structure of sounds. Most methods usually process spectrogram-like quantities only and discard the phase information, which sets a limit to their performance. I propose to tackle phase processing and recovery by means of signal analysis: my approach consists in extracting phase properties from time-domain signal models (such as mixtures of sinusoids), and incorporating those in source separation models. I will also present a phase-aware probabilistic framework based on the von Mises and anisotropic Gaussian distributions. These approaches will be combined with spectrogram decomposition techniques such as nonnegative matrix factorization and deep neural networks. Lastly, if time allows, I will present some contributions on other topics such as acoustic scene analysis and music recommendation.

In the second part of this talk, I will present my research program for the upcoming years. The current trend in audio research consists in leveraging deep models along with large collections of data and increased processing power: even though performing impressively in controlled conditions, these approaches are hard to deploy in practical (unseen) situations, and raise the question of economical and energetic costs. Conversely, my vision falls within a paradigm of reduced supervision, and consists in leveraging expert knowledge in conjunction with machine learning for data-efficient and flexible audio processing. In the continuity of my past research, I propose to develop the analysis and modeling of complex-valued representations, since deep learning techniques could greatly benefit from processing all the available data instead of nonnegative representations only. I will particularly develop phase-aware probabilistic modeling, by extending the notion of anisotropy to alternative distributions. Finally, I will study the interface between factorization models and deep learning in order to design semi- (or un-) supervised methods that are light and interpretable.

Complex-valued and hybrid models for audio processing

Antoine DELEFORGE