Return to Research

Extreme Pose Interaction (ExPI) Dataset


[Paper]    [code](coming soon)    [data](coming soon)     [toolbox](coming soon)

We present the Extreme Pose Interaction (ExPI) Dataset, a new person interaction dataset of Lindy Hop aerial steps [1].

Our dataset contains 2 couples of dancers performing 16 different aerials (dancing actions),  obtaining 115 sequences with 30k frames for each viewpoint and 60k instances with annotated 3D body poses and shapes.

In Lindy Hop, the two dancers have different roles, referred to as leader and follower.  To perform these aerials, the two dancers perform different movements that require a high level of synchronization. These aerials are composed of extreme poses and require strict and close cooperation between the two persons, which is highly suitable for the study of human interactions.

The seven first aerials are performed by both couples. Six more aerials are performed by Couple 1, while three others by Couple 2.

Table1 Aerials of the ExPI Dataset.

1. Dataset Structure

In the ExPI dataset 16 different aerials are performed, some by both dancer couples, some by only one of the couples, as shown in Table 1.  The seven first aerials (A1 ~A7) are performed by both couples. Six more aerials (A8 ~A13) are performed by Couple 1, while three others (A14 ~A16) by Couple 2.

Each of the aerials was repeated five times to account for variability. Overall, ExPI contains 115 short sequences, each one depicting the execution of one of the aerials.   More precisely, for each recorded sequence ExPI provides:

  • Multi-view image sequences at 50FPS, from all the available cameras.
  • Mocap data: 3D position of all joints per frame. Mocap data is recorded at 100FPS.
  • Camera calibration information.
  • 3D shapes as a textured mesh for each frame.

2.  Data Collection

The data were collected by Kinovis-platform[2],  which has 2 acquisition systems: A 68 color cameras (4MPixels) system that provides full shape and appearance information with 3D textured meshes, and a standard Motion capture (Mocap) system composed of 20 cameras that provides infrared-reflective marker trajectories.

Figure1: Order of the joints

We track 18 joints per person.  The order of the keypoints is as follows, where “F” and “L” denote the Follower and the Leader respectively, and “f”, “l” and “r” denote “forward”, “left” and “right”, also seen as Figure1:  (0) `L-fhead’, (1) `L-lhead’, (2) `L-rhead’, (3) `L-back’, (4) `L-lshoulder’, (5) `L-rshoulder’, (6) `L-lelbow’, (7) `L-relbow’, (8) `L-lwrist’, (9) `L-rwrist’, (10) `L-lhip’, (11) `L-rhip’, (12) `L-lknee’, (13) `L-rknee’, (14) `L-lheel’, (15) `L-rheel’, (16) `L-ltoes’, (17) `L-rtoes’, (18) `F-fhead’, (19) `F-lhead’, (20) `F-rhead’, (21) `F-back’, (22) `F-lshoulder’, (23) `F-rshoulder’, (24) `F-lelbow’, (25) `F-relbow’, (26) `F-lwrist’, (27) `F-rwrist’, (28) `F-lhip’, (29) `F-rhip’, (30) `F-lknee’, (31) `F-rknee’, (32) `F-lheel’, (33) `F-rheel’, (34) `F-ltoes’, and (35) `F-rtoes’.


When collecting the motion capture data, some points are missed by the system due to occlusions or tracking losses. To overcome this issue, we manually post-processed the missing points. We have designed and implemented a 3D hand labeling toolbox to ease this process.  The labeled joints are projected into 3D and various 2D images to confirm the quality of the approximation by visual inspection.

3.  Evaluation

We proposed different evaluation metrics in the paper for the human motion prediction task. The corresponding codes will be released soon.


If you find our data useful, please cite our work:

         title={Multi-Person Extreme Motion Prediction with Cross-Interaction Attention}, 
         author={Wen,Guo and Xiaoyu, Bie and Xavier, Alameda-Pineda, Francesc,Moreno-Noguer}, 
         journal={arXiv preprint arXiv:2105.08825}, 
         year={2021} }

[1] Lindy Hop aerial steps:  The Lindy Hop is an African-American couple dance born in the 1930’s in Harlem, New York.
[2] Kinovis-platform: