AIST++ Dataset - Download

Trouble downloading the dataset? Let us know.

Download the videos/images/music

Before the downloading, please make sure you have read and agree with the Terms of Use of the AIST Dance Video Database.

If you're interested in downloading the full set of AIST Dance Videos (with music), please visit their website. If you only need to download the subset of AIST Dance Videos that we annotated, we provide a Python script that downloads the videos (with music) from their website.

Download the file downloader.py (press Ctrl + S), or directly run:

wget https://raw.githubusercontent.com/google/aistplusplus_api/main/downloader.py

Run the following script:

python downloader.py --download_folder=$DOWNLOAD_FOLDER --num_processes=5

For help, run:

python downloader.py -h

Convert videos to images under exact 60 FPS instead of the raw fps. Our Python API code provides a ffmpeg-based function to do it but you are welcome to use alternative ways.

Download the annotations

Update 07/27/2021: Release preprocessed train/test data.

Data API

We provide the Python API code for loading AIST++ annotations, as well as visualizing them. The API repository also contains reproducible code that we used to construct those annotations, including camera parameters estimation, 3D keypoints reconstruction and 3D human motion fitting.

Data Formats

Sequence Names

The annotations in the AIST++ are marked by sequence name. Each sequence name corresponds with a set of multi-view video names in the AIST Dance Video DB.

For example, gBR_sBM_cAll_d04_mBR0_ch01 in our annotations corresponds with gBR_sBM_{c01, ..., c09}_d04_mBR0_ch01 in the AIST database.

Camera Parameters

The camera intrinsic and extrinsic information is stored in two files.

Individual environment settings, each with 9 cameras in the environment. (setting_<suffix>.json)
Mapping from environment settings to sequence names. (mapping.txt)

Each camera has the following attributes:

name: The camera name corresponding to the AIST database.
size: Canvas resolution. It is [1920, 1080] for all AIST 1080P video.
matrix: The 3x3 camera intrinsic matrix.*
rotation: The global rotation vector in Rodrigues format.*
translation: The global translation vector.*
distortions: Distortion coefficients.*

*See here for detailed definition.

Human Motion Sequence

Each SMPL-format human motion sequence is stored in a .pkl file with the following attributes:

smpl_poses: Sequences of SMPL pose parameters. Array shape is (N, 24, 3).
smpl_scaling: Human body scaling factor. A scalar value for each sequence.
smpl_trans: Motion 3D trajectory. Array shape is (N, 3).

3D&2D Keypoints Annotation

We provide COCO-format keypoints annotation in both 2D and 3D. Each keypoints sequence is stored in a .pkl file with the following attributes:

keypoints3d: Sequences of 3D keypoints reconstructed frame-by-frame. Array shape is (N, 17, 3).
keypoints3d_optim: Sequences of 3D keypoints reconstructed with temporal smoothness and constrains.
keypoints2d: Multi-view frame-by-frame 2D keypoints detection results. Array shape is (9, N, 17, 3). The last dimension is (x, y, confidence).

Timestamps

Our annotations are frame-by-frame under exact 60 FPS. Some videos in the AIST Dance Video DB have slightly different FPS. In order to align frames across multi-view videos, we hard-coded 60 FPS when converting videos into images.

Filter List

We manually checked our annotations and conducted numerical analysis. A filter list is provided that contains those sequences with poorly reconstructed 3D keypoints and human motion annotations (ignore_list.txt). We recommend excluding those annotations in your research or study.