Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images

Abstract

In recent years, 3D parametric animal models have been developed to aid in estimating 3D shape and pose from images and video. While progress has been made for humans, it's more challenging for animals due to limited annotated data. To address this, we introduce the first method using synthetic data generation and disentanglement to learn to regress 3D shape and pose. Focusing on horses, we use text-based texture generation and a synthetic data pipeline to create varied shapes, poses, and appearances, learning disentangled spaces. Our method, Dessie, surpasses existing 3D horse reconstruction methods and generalizes to other large animals like zebras, cows, and deer.

Video

Single-Image 3D Reconstruction

Given a single image of horses in any image style, Dessie reconstructs articulated 3D shape and pose of the horses.

Dessie show pose capture generation given single images of horse-like species.

Dessie show pose capture generation given single images of other four-leg species.

Video Frame Reconstruction

We reconstruct horses from video frames by processing every single frame.

BibTeX

@inproceedings{li2024dessie,
  title={Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images},
  author={Li, Ci and Yang, Yi and Weng, Zehang and Hernlund, Elin and Zuffi, Silvia and Kjellstr{\"o}m, Hedvig},
  booktitle={Asian Conference on Computer Vision},
  year={2024}
  }