Address: INRIA Rennes – Bretagne Atlantique,
Campus Universitaire de Beaulieu,
35042 Rennes cedex – France
Léo Maczyta obtained a Masters degree in Robotics and embedded systems from ENSTA ParisTech Université Paris-Saclay, France, and a Masters degree in Advanced systems and robotics from University Pierre and Marie Curie, France in 2017. He is conducting a PhD thesis at Inria Rennes on the research topic of motion saliency estimation in video sequences.
L. Maczyta, P. Bouthemy, O. Le Meur. CNN-based temporal detection of motion saliency in videos. Pattern Recognition Letters, 2019
L. Maczyta, P. Bouthemy, O. Le Meur. Unsupervised motion saliency map estimation based on optical flow inpainting. Proc. IEEE Int. Conf. on Image Processing (ICIP’19), Taipei, 2019
National conference papers
L. Maczyta, P. Bouthemy, O. Le Meur. Estimation non supervisée de cartes de saillance dynamique dans des vidéos. In Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP’20), 2020
L. Maczyta, P. Bouthemy, O. Le Meur. Détection temporelle de saillance dynamique dans des vidéos par apprentissage profond. In Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP’18), Marne-la-Vallée, France, 2018
Motion saliency detection
We consider the problem of frame-based motion saliency detection, that is, estimating which frames of a video contain motion saliency. The method we propose is based on two steps. First, the dominant motion due to the camera is cancelled. For the variant RFS-Motion2D, this is done by subtracting to the optical flow the dominant motion, which is estimated with a parametric model. Then, the classification into the salient or non salient class is achieved with a convolutional neural network, applied to the residual flow.
Framework for motion saliency detection.
Architecture of the classification Convolutional Neural Network (CNN) of the motion saliency detection framework.
Results obtained with our method RFS-Motion2D. The colour of the frame border, orange or blue, designates the prediction, respectively dynamic saliency or non saliency. The green (respectively red) square at the bottom right indicates that the prediction is correct (respectively wrong). The first part of the video is constituted of non salient clips, and the second part is constituted of salient clips. The video shows that non saliency is correctly predicted even when the shallow scene assumption is not valid, with for instance walls, trees or pillars in the foreground.
Motion saliency map estimation
The problem of motion saliency map estimation consists in predicting a saliency value for each pixel of video frames. To solve it, we build a method based on optical flow inpainting. In a first step, candidate salient regions are extracted. Then, the optical flow inside these regions is inpainted from the surrounding flow. The idea is that, if the region is salient, the reconstructed flow should be clearly distinct from the flow present originally. On the contrary, if the region is not salient, the reconstructed and original flow should be similar. We then leverage the difference between these two flows (the residual flow) to compute the motion saliency map.
Framework for motion saliency map estimation based on optical flow inpainting.
Motion saliency map estimation results. Top to bottom: video frame, computed residual flow, motion saliency map predicted by our method MSI-ns, and binary ground truth. The three samples on the left come from the DAVIS 2016 dataset. In the fourth sample, a square region has been artificially moved. The fifth sample comes from an infra-red video.