Overview of AIST++

The above video contains music. You can click the video to unmute it.

The AIST++ Dance Motion Dataset is constructed from the AIST Dance Video DB. With multi-view videos, an elaborate pipeline is designed to estimate the camera parameters, 3D human keypoints and 3D human dance motion sequences:

With those annotations, AIST++ is designed to support tasks including:


The following paper describes AIST++ dataset in depth: from the data processing to detailed statistics about the data. If you use the AIST++ dataset in your work, please cite this article.

Ruilong Li*, Shan Yang*, David A. Ross, Angjoo Kanazawa.
Learn to Dance with AIST++: Music Conditioned 3D Dance Generation.
arXiv, 2021.
[PDF] [BibTeX] [Web]
Dataset organization

The dataset is split into training/validation/testing sets in different ways serving for different purposes.

Table 1: Data Splits based on Subjects.

Train Validation Test
Images 6,420,059 508,234 3,179,722
Sequences 868 70 470
Subjects 20* 20* 10

Table 2: Data Splits based on Music-Choreography.

Train Validation Test
Seconds 13,963.6 187.6 187.6
Sequences 980 20 20
Choreographies 420 20 20
Music 50 10* 10*

*splits have shared data across this field.


The annotations are licensed by Google LLC under CC BY 4.0 license.