RealEstate10K: A Large Dataset of Camera Trajectories from Video Clips

Visualization of a trajectory from a camera flying above a house, derived from a CC-BY video from YouTube user SonaVisual.

RealEstate10K is a large dataset of camera poses corresponding to 10 million frames derived from about 80,000 video clips, gathered from about 10,000 YouTube videos. For each clip, the poses form a trajectory where each pose specifies the camera position and orientation along the trajectory. These poses are derived by running SLAM and bundle adjustment algorithms on a large set of videos.

This dataset helped power a SIGGRAPH 2018 paper from Google, Stereo Magnification: Learning view synthesis using multiplane images, which learns to convert a narrow-baseline stereo pair into a mini-lightfield using training data like RealEstate10K. This dataset is intended to aid researchers in their work on view synthesis, 3D computer vision, and beyond.

Ready to start using RealEstate10K?