Back to MLP: A Simple Baseline for Human Motion Prediction

by Wen Guo*, Yuming Du*, Xi Shen, Vincent Lepetit, Xavier Alameda-Pineda, and Francesc Moreno-Noguer

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023, Waikoloa, Hawaii

Abstract. This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. State-of-the-art approaches provide good results, however, they rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks (RNN), Transformers, or Graph Convolutional Networks (GCN), typically requiring multiple training stages and more than 2 million parameters. In this paper, we show that, after combining with a series of standard practices, such as applying Discrete Cosine Transform (DCT), predicting residual displacement of joints, and optimizing velocity as an auxiliary loss, a light-weight network based on multi-layer perceptrons (MLPs) with only 0.14 million parameters can surpass the state-of-the-art performance. An exhaustive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that our method, named SiMLPe, consistently outperforms all other approaches. We hope that our simple method could serve as a strong baseline for the community and allow re-thinking of the human motion prediction problem.

Pipeline

An overview of our approach SiMLPe for human motion prediction. FC denotes a fully connected layer, LN denotes layer normalization which is the only non-linear layer in this architecture, and Trans represents the transpose operation. DCT and IDCT represent the discrete cosine transformation and inverse discrete cosine transformations respectively. The MLP blocks (in gray), composing FC and LN, are repeated m times.

Performance

This figure shows the MPJPE metric in mm at 1000 ms as performance on the vertical axis, on H3.6m dataset. The closer to the bottom-left, the better. Our method (SiMLPe, in red) achieves the lowest error with significantly fewer parameters. Please find more detailed results in the paper.

Reference

@article{guo2022back,

title={Back to MLP: A Simple Baseline for Human Motion Prediction},

author={Guo, Wen and Du, Yuming and Shen, Xi and Lepetit, Vincent and Xavier, Alameda-Pineda and Francesc, Moreno-Noguer},

journal={arXiv preprint arXiv:2207.01567},

year={2022} }