The scientific foundations of PANAMA are focused on sparse representations and probabilistic modeling, and its scientific scope is extended in three major directions beyond what was investigated in METISS:
- The extension of the sparse representation paradigm towards that of “sparse modeling”, with the challenge of establishing, strengthening and clarifying connections between sparse representations and machine learning (a central aspect of the ERC PLEASE research program).
- A focus on sophisticated probabilistic models and advanced statistical methods to account for complex dependencies between multi-layered variables (such as in audio-visual streams, musical contents, biomedical data…).
- The investigation of graph-based representations, processing and transforms, with the goal to describe, model and infer underlying structures within content streams or data sets.
Sparse signal processing and inverse problems
A large part of the planned activities of PANAMA will revolve around inverse problems and their strong connections with sparse and structured signal models.
The approaches which combine sparsity and random low-dimensional projections are built on solid mathematical foundations, are associated to computationally efficient algorithms, and have led to state-of-the-art results in many domains, from the signal level (denoising, source localization and separation) to the “semantic” level (classification and recognition). A flagship application of sparsity is the new paradigm of compressive sensing, which exploits sparsity for high-resolution data acquisition using limited resources (e.g. fewer/less expensive sensors, limited energy consumption, etc.).
These well-established tools still raise a number of difficult theoretical questions, but the main challenge today is to bring them to a level of maturity enabling their deployment and usability in a wide range of applicative scenarii. This primarily involves solving algorithmic bottlenecks to address truly large-scale problems, but also and perhaps more importantly to develop methods to finely adjust the models parameters to the nature of the processed data and, for compressive sensing, to understand how the principles of random sensing can be mitigated with hardware constraints.
Structure plays an important role in signal models. Either deterministic or probabilistic, signal models are indeed rarely only characterized by a simple vector of parameters: their description often also critically involves various types of underlying structures that capture connections, interrelations, similarities and disparities between parameters, states, nodes, parts, etc.
Another focus of the activities of PANAMA will revolve around machine learning.
A key objective of machine learning is to infer properties of functions (which can be seen as an infinite dimensional vector in some abstract function space) from a limited number of observations. Typical examples include regression (to interpolate a function f(x) given a collection of noisy observations yn≅ f(xn) of its values at N known points xn), classification (where the values yn are class labels), and density estimation, where one wishes to infer the probability density function (or pdf) f(x) of the data from finitely many observations xn. More generally, learning a signal model from a collection has versatile applications such as signal estimation and restoration (denoising, declipping, dereverberation, source separation, etc.) but also information extraction and structuring (source localization, segmentation, classification, diarization, etc.).
Today, there is a number of converging evidence that there exists strong links between machine learning and sparse signal processing. One of the key objectives of PANAMA, in the context of the ERC Starting Grant PLEASE, will be to propose a novel comprehensive theoretical and algorithmic framework at the confluence of these research domains, based on the concepts of sparsity and low-dimensional random projections.
Model design, discovery and learning
In practice, the use of signal models must be accompanied by a number of adjustments to take into account problems occurring in real contexts of use, such as model inaccuracy, the insufficiency (or even the absence) of training data, their poor statistical coverage, etc. A particular challenge is to adapt the models to the data by learning from a training corpus.
For instance, the efficient deployment of sparse models for large-scale data is only possible if supported by efficient sparse models, which must encompass computational efficiency as well as the ability to provide sparse and structured data representations through the appropriate choice of dictionaries. Hence, for sparse modeling, it is now crucial to develop automated techniques to somehow “industrialize” the design of a dictionary to model a class of signals of interest. This is all the more challenging, as the dictionary-learning problem is intrinsically non-convex, in contrast to the now well-established convex-optimization methodology for sparse decompositions.
Similarly, while the design of Bayesian networks and associated inference algorithms to model audiovisual streams is currently something of an “art”, relying on expert knowledge, there is a crucial need to now infer the underlying structures (graphs, recurring motifs, etc.) by learning and discovering from a training corpus.
The investigation of new frameworks, tools, concepts to robustly design, discover and learn signal models from training data will constitute one of the most pioneering exploratory activities of PANAMA.