Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

Authors : Quan Meng, Lei Li, Matthias Nießner, Angela Dai (CASIA, DAMO Academy, Alibaba Group, Hupan Lab)

[Project Page] [Paper]

Despite various research in dynamic 3D content generation (4D Generation) there hasn’t been a singular foundation model. Separately learning spatial factors from 3D models and temporal motions from video models result in quality degradation (e.g. SVD + Zero-123), and animating 3D objects usually fails to preserve multi-view attributes.

Animate3D suggests to animate any 3D models with unified spatiotemporal consistent supervision. The process first starts with MV-VDM, a foundational 4D model based from MVDream and a spatiotemporal motion module, focused on learning natural dynamic motions. A MV2V-Adapter, adapted from I2V-Adapter, is also used to handle multi-view image conditions. For 3D context, 4DGS is jointly optimized through both reconstruction and 4D Score Distillation Sampling. For training, the authors also create MV-Video, a large-scale multi-view video dataset that consists about 1.8M multi-view videos.

Citation

@article{
	jiang2024animate3d,
	title={Animate3D: Animating Any 3D Model with Multi-view Video Diffusion},
	author={Yanqin Jiang and Chaohui Yu and Chenjie Cao and Fan Wang and Weiming Hu and Jin Gao},
	booktitle={arXiv},
	year={2024},
}