AIST++ Dataset Team Explore Download Description

Trouble downloading the dataset? Let us know.

Download the videos/images/music

If you're interested in downloading the full set of AIST Dance Videos (with music), please visit their website. If you only need to download the subset of AIST Dance Videos that we annotated, we provide a Python script that downloads the videos (with music) from their website.
  1. Download the file (press Ctrl + S), or directly run:
  2. Run the following script:
    python --download_folder=$DOWNLOAD_FOLDER --num_processes=5
    For help, run:
    python -h
  3. Convert videos to images under exact 60 FPS instead of the raw fps. Our Python API code provides a ffmpeg-based function to do it but you are welcome to use alternative ways.

Download the annotations

Data API

We provide the Python API code for loading AIST++ annotations, as well as visualizing them. The API repository also contains reproducible code that we used to construct those annotations, including camera parameters estimation, 3D keypoints reconstruction and 3D human motion fitting.

Data Formats

Sequence Names

The annotations in the AIST++ are marked by sequence name. Each sequence name corresponds with a set of multi-view video names in the AIST Dance Video DB.

For example, gBR_sBM_cAll_d04_mBR0_ch01 in our annotations corresponds with gBR_sBM_{c01, ..., c09}_d04_mBR0_ch01 in the AIST database.

Camera Parameters

The camera intrinsic and extrinsic information is stored in two files.

Each camera has the following attributes:

*See here for detailed definition.

Human Motion Sequence

Each SMPL-format human motion sequence is stored in a .pkl file with the following attributes:

3D&2D Keypoints Annotation

We provide COCO-format keypoints annotation in both 2D and 3D. Each keypoints sequence is stored in a .pkl file with the following attributes:


Our annotations are frame-by-frame under exact 60 FPS. Some videos in the AIST Dance Video DB have slightly different FPS. In order to align frames across multi-view videos, we hard-coded 60 FPS when converting videos into images.

Filter List

We manually checked our annotations and conducted numerical analysis. A filter list is provided that contains those sequences with poorly reconstructed 3D keypoints and human motion annotations (ignore_list.txt). We recommend excluding those annotations in your research or study.