Master internship on deep Bayesian filtering

This figure shows a set of landmarks detected on a human face. It is proposed to develop a deep Bayesian filter that tracks these landmarks over time in order to remove noise and detection errors and to obtain smooth trajectories

In signal processing and in computer vision, some of the most powerful tracking methods are based on the Kalman filter. The latter belongs to the unsupervised class of machine learning techniques and may well be viewed either as the simplest dynamic Bayesian network (DBN) or as a state-space model: it recursively predicts over time a set of state variables, using an observation model (which takes into account the incoming measurements) and a dynamic model (which follows a Markov process). State-space models further decompose into linear dynamical systems (LDSs) and non-linear ones. LDSs are solved under the hypothesis that both the state and observed variables are Gaussian. The case on non-linear systems is more involved and several solvers are available, such as the extended Kalman filter (EKF), the unscented Kalman filter (UKF) or the switching Kalman filter (SKF). Recently, a new class of Kalman/Bayesian filters based on deep learning architectures, more precisely based on recurrent neural networks (RNNs) triggered a lot of interest among machine learning, signal processing and computer vision researchers. To date, there is a consensus that long short-term memory (LSTM) is the RNN architecture of choice. In parallel there is a burgeoning literature on discriminative Kalman filters (see References).

In this project we propose to study various deep Bayesian/Kalman filters available in the literature and to propose a deterministic tracking algorithm. In particular we are interested in developing an algorithm for learning to track human facial expressions. A human face can be characterized by facial landmarks (or key points) and a number of facial landmark detectors are available today. Nevertheless, these detectors are applied to one image at a time, and they often provide inconsistent landmarks over long periods time. It is therefore necessary to track each facial landmark individually in order to extract reliable landmark trajectories and to recognize facial expressions.

 Start and end dates: February/March 2020 for a duration of up to six months.

References:

T. Haarnoja, A. Ajay, S. Levine, P. Abbeel. Backprop KF: Learning discriminative deterministic state estimators. Neural Information Processing Systems Conference, 2016.

J. Gu, X. Yang, S. de Mello and J. Kautz. Dynamic facial analysis: from Bayesian filtering to recurrent neural networks. IEEE Computer Vision and Pattern Recognition Conference, 2018.

S. Nie, M. Zheng Q. Ji. The deep regression Bayesian network and its applications. IEEE Signal Processing Magazine, 2018.

Contact: Radu.Horaud@inria.fr