Augmented Reality (AR) technology creates fun, engaging, and immersive user experiences. The ability to perform AR tracking across devices and platforms, without initialization, remains important to power AR applications at scale.
MediaPipe Instant Motion Tracking provides AR tracking across devices and platforms without initialization or calibration. It is built upon the MediaPipe Box Tracking solution. With Instant Motion Tracking, you can easily place virtual 2D and 3D content on static or moving surfaces, allowing them to seamlessly interact with the real-world environment.
|Fig 1. Instant Motion Tracking is used to augment the world with a 3D sticker.|
We first use a StickerManagerCalculator to prepare the individual sticker data for the rest of the application. This information is then sent to the RegionTrackingSubgraph that performs 3D region tracking for sticker placement and rendering. Once acquired, our tracked sticker regions are sent with user transformations (i.e. gestures from the user to rotate and zoom the sticker) and IMU data to the MatricesManagerCalculator, which turns all our sticker transformation data into a set of model matrices. This data is handled directly by our GlAnimationOverlayCalculator as an input stream, which will render the provided texture and object file using our matrix specifications. The output of GlAnimationOverlayCalculator is a video stream depicting the virtual 3D content rendered on top of the real world, creating immersive AR experiences for users.
With the Instant Motion Tracking MediaPipe graph, an application can create an interactive and realistic AR experience by specifying the required input streams, side packets, and output streams. The input streams are the following:
- Input Video (GpuBuffer): Video frames to render augmented stickers onto.
- Rotation Matrix (9-element Float Array): The 3x3 row-major rotation matrix from the device IMU to determine proper orientation of the device.
- Sticker Proto String (String): A string representing the serialized sticker buffer protobuf message, containing a list of all stickers and their attributes.
- Each sticker in the Protobuffer has a unique ID to find associated anchors and transforms, an initial anchor placement in a normalized [0.0, 1.0] 3D space, a user rotation and user scaling transform on the sticker, and an integer indicating which type of objects to render for the sticker (e.g. 3D asset or GIF).
- Sticker Sentinel (Integer): When an anchor must be initially placed or repositioned, this value must be changed to the ID of the anchor to reset from the sticker buffer protobuf message. If no valid ID is provided, the system will simply maintain tracking.
Side packets are also an integral part of the Instant Motion Tracking solution to provide device-specific information for the rendering system:
- Field of View (Float): The field of view of the camera in radians.
- Aspect Ratio (Float): The aspect ratio (width / height) of the camera frames (this ratio corresponds to the image frames themselves, not necessarily the screen bounds).
- Object Asset (String): The GlAnimationOverlayCalculator must be provided with an associated asset file name pointing to the 3D model to render in the viewfinder.
- (Optional) Texture (ImageFrame on Android, GpuBuffer on iOS): Textures for the GlAnimationOverlayCalculator can be provided either via an input stream (dynamic texturing) or as a side packet (unchanging texture).
The rendering system for the Instant Motion Tracking is powered by OpenGL. For more information regarding the structure of model matrices and OpenGL rendering, please visit OpenGL Wiki. With the specifications above, the Instant Motion Tracking capabilities can be adapted to any device that is able to run the MediaPipe framework with a working IMU system and connected camera.
Please first see general instructions for Android on how to build MediaPipe examples.
Android target (or download prebuilt ARM64 APK):
Assets rendered by the GlAnimationOverlayCalculator must be preprocessed into an OpenGL-ready custom .uuu format. This can be done for user assets as follows:
./mediapipe/graphs/object_detection_3d/obj_parser/obj_cleanup.sh [INPUT_DIR] [INTERMEDIATE_OUTPUT_DIR]
and then run
bazel run -c opt mediapipe/graphs/object_detection_3d/obj_parser:ObjParser -- input_dir=[INTERMEDIATE_OUTPUT_DIR] output_dir=[OUTPUT_DIR]
INPUT_DIR should be the folder with initial asset .obj files to be processed, and OUTPUT_DIR is the folder where the processed asset .uuu file will be placed.
Note: ObjParser combines all .obj files found in the given directory into a single .uuu animation file, using the order given by sorting the filenames alphanumerically. Also the ObjParser directory inputs must be given as absolute paths, not relative paths. See parser utility library at
mediapipe/graphs/object_detection_3d/obj_parser/for more details.
- Google Developers Blog: Instant Motion Tracking With MediaPipe
- Google AI Blog: The Instant Motion Tracking Behind Motion Stills AR
- Paper: Instant Motion Tracking and Its Applications to Augmented Reality