Audio signal processing and modeling are a central theme of the PANAMA project-team, with strong and growing industrial connections in the field. However, given the many potential applications beyond audio of the models, methods and algorithms that PANAMA develops, applications to other types of signals may be considered, in particular to biomedical signals. These applications will be primarily investigated in partnership with research groups with the relevant expertise.
The research conducted in PANAMA relies on a continuous feedback between the design of mathematical founded and algorithmically efficient frameworks and their assessment on a set of targeted applications that in turn fuel the proposed frameworks. PANAMA’s primary targeted applications are:
- Acoustic scene capture. Acoustic fields carry much information about audio sources (musical instruments, speakers, etc.) and their environment (e.g., church acoustics differ much from office room acoustics). A particular challenge is to capture as much information from a complete 3D+t acoustic field associated with an audio scene, using as few sensors as possible. Through the ECHANGE ANR-DEFIS project, which METISS coordinated, the feasibility of compressive sensing to address this challenge was shown in certain scenarii. The actual implementation of this framework is one of the first considered applications, with possible practical scenarii such as remote surveillance to detect abnormal events, e.g. for health care of the elderly or public transport surveillance.
-
Audio signal separation in reverberant environments. Audio signal separation consists in extracting the individual sound of different instruments or speakers that were mixed on a recording. It is now successfully addressed in the academic setting of linear instantaneous mixtures. Yet, real-life recordings, generally associated to reverberant environments, remain an unsolved difficult challenge, especially with many sources and few audio channels. Much of the difficulty comes from the estimation of the unknown room impulse response associated to a matrix of mixing filters, which can be expressed as a dictionary-learning problem. Solutions to this problem have the potential to impact, for example, the music and game industry, through the development of new digital re-mastering techniques and virtual reality tools, but also surveillance and monitoring applications, where localizing audio sources is important.
PANAMA’s strategy is to achieve a good equilibrium between applications of the developed models and methods to audio, which will benefit from METISS solidly established know-how, and to other domains (multimedia indexing, biomedical data processing) where collaboration with existing specialized teams (e.g. TEXMEX, VISAGES) is expected to maximize their potential impact.
Audiovisual and multimedia content generate large data streams (audio, video, associated data such as text, etc.). Manipulating large databases of such content requires efficient techniques to: segment the streams into coherent sequences; label them according to words, language, speaker identity, and more generally to the type of content; index them for easy querying and retrieval, etc. As the next generation of online search engines will need to offer content-based means of searching, the need to drastically reduce the computational burden of these tasks is becoming all the more important as we can envision the end of the era of wasteful datacenters that can increase forever their energy consumption. Most of today’s techniques to deal with such large audio streams involve extracting features such as Mel Frequency Cepstral Coefficients (MFCC) and learning high-dimensional statistical models such as Gaussian Mixture Models, with several thousand parameters. Through the exploration of a compressive learning framework, PANAMA is expected to contribute to new techniques to efficiently process such streams and perform segmentation, classification, etc., in the compressed domain. A particular challenge will be to understand how this paradigm can help exploiting truly multimedia features, which combine information from different associated streams such as audio and video, for joint audiovisual processing.