Seminar by Manuel J. Marin-Jimenez, Universidad de Córdoba
Tuesday 19 December 2017, 11:00 – 12:00, room F107
INRIA Montbonnot Saint-Martin
Abstract: This talk targets people identification in video based on the way they walk (i.e. gait). While classical methods typically derive gait signatures from sequences of binary silhouettes, in this talk we present the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). We carry out a thorough experimental evaluation of the proposed CNN architecture on the challenging TUM-GAID dataset. The experimental results indicate that using spatio-temporal cuboids of optical flow as input data for CNN allows to obtain state-of-the-art results on the gait task, with an image resolution eight times lower than the previously reported results (i.e. 80×60 pixels).
In the second part of this talk, we support that, although gait is mainly used for identification, additional tasks as gender recognition or age estimation may be addressed based on gait as well. In such cases, traditional approaches consider those tasks as independent ones, defining separated task-specific features and models for them. Our approach shows that by training jointly more than one gait-based tasks, the identification task converges faster than when it is trained independently, and the recognition performance of multi-task models is equal or superior to more complex single-task ones. Our model is a multi-task CNN that receives as input a fixed-length sequence of optical flow channels and outputs several biometric features (identity, gender and age).
Finally, we will show preliminary results on multimodal feature fusion, based on CNNs, for improving recognition. In particular, the input sources are gray-level pixels, depth maps and optical flow. The experiments show interesting and promising results to continue this line.