Seminar: Complex-valued and hybrid models for audio processing

Seminar by Paul Magron, IRIT

Tuesday, February 2nd, 16:00 – 17:00

INRIA Montbonnot Saint-Martin


Abstract: In this talk, I will give an overview of my work, which main application is sound source separation, the task of automatically extracting constitutive components from their observed mixture in an audio recording. I will address it in the time-frequency domain, which reveals the underlying structure of sounds. Most methods usually process spectrogram-like quantities only and discard the phase information, which sets a limit to their performance. I propose to tackle phase processing and recovery by means of signal analysis: my approach consists in extracting phase properties from time-domain signal models (such as mixtures of sinusoids), and incorporating those in source separation models. I will also present a phase-aware probabilistic framework based on the von Mises and anisotropic Gaussian distributions. These approaches will be combined with spectrogram decomposition techniques such as nonnegative matrix factorization and deep neural networks. Lastly, if time allows, I will present some contributions on other topics such as acoustic scene analysis and music recommendation.

In the second part of this talk, I will present my research program for the upcoming years. The current trend in audio research consists in leveraging deep models along with large collections of data and increased processing power: even though performing impressively in controlled conditions, these approaches are hard to deploy in practical (unseen) situations, and raise the question of economical and energetic costs. Conversely, my vision falls within a paradigm of reduced supervision, and consists in leveraging expert knowledge in conjunction with machine learning for data-efficient and flexible audio processing. In the continuity of my past research, I propose to develop the analysis and modeling of complex-valued representations, since deep learning techniques could greatly benefit from processing all the available data instead of nonnegative representations only. I will particularly develop phase-aware probabilistic modeling, by extending the notion of anisotropy to alternative distributions. Finally, I will study the interface between factorization models and deep learning in order to design semi- (or un-) supervised methods that are light and interpretable.

Biography: Paul Magron received the State Engineering degree from the Ecole des Ponts ParisTech (Paris, France) in 2013, the M.Sc. degree in acoustics, signal processing and computer science applied to music from the Sorbonne University (Paris, France) in 2013, and the Ph.D. degree from Télécom ParisTech (Paris, France) in 2016, in the field of signal processing. From 2017 to 2019, he worked as a postdoctoral researcher within the Audio Research Group, Tampere University, (Tampere, Finland). Since 2019, he works as a postdoctoral researcher within the Signal and Communications group, Institut de Recherche en Informatique de Toulouse (IRIT, Toulouse, France). His research interests include audio signal processing, sound source separation, phase recovery, nonnegative matrix factorization, probabilistic modeling and music recommendation. He has authored about 20 scientific publications on the above-mentioned topics, and is the recipient of the iWAENC 2018 best paper award for his work on complex nonnegative matrix factorization with beta-divergences.