AnimaX – A 3D Animation Generation Framework Jointly Developed by Beihang University, Tsinghua University, and Others

What is AnimaX？

AnimaX is an efficient 3D animation generation framework jointly developed by Beihang University, Tsinghua University, the University of Hong Kong, and others. It combines the motion priors from video diffusion models with skeleton-based animation structures. The framework effectively transfers motion knowledge from videos to the 3D domain and supports diverse joint meshes with arbitrary skeletal structures. AnimaX represents 3D motions as multi-view, multi-frame 2D pose maps and uses a joint video-pose diffusion model, integrating shared positional encoding and modality-aware embeddings to ensure spatiotemporal alignment between video and pose sequences, enabling high-quality 3D animation generation. Trained on a dataset containing 160,000 rigged sequences, AnimaX achieves industry-leading generalization, motion fidelity, and efficiency, providing a scalable solution for category-agnostic 3D animation.

Main Features of AnimaX

Support for Arbitrary Skeletal Structures: AnimaX can handle 3D models with different skeletal topologies, applicable to various characters and objects such as humans, animals, furniture, and more.
Text-Driven Animation: Users can specify animation content with simple text descriptions, and AnimaX generates corresponding animation sequences based on the textual prompts.
Multi-View Consistency: Generated animations maintain consistency across multiple viewpoints, ensuring coherence and realism when viewed from different angles.
Efficient Generation: Based on a feed-forward approach, AnimaX produces high-quality 3D animations in a short time, greatly improving animation production efficiency.

Technical Principles of AnimaX

Joint Video-Pose Diffusion Model: AnimaX represents 3D motion as multi-view, multi-frame 2D pose maps. It leverages the powerful motion priors of video diffusion models while remaining compatible with 3D skeletal animations. The joint video-pose diffusion model simultaneously generates videos and corresponding 2D pose sequences. This joint generation strategy ensures spatiotemporal alignment between videos and poses.
Shared Positional Encoding and Modality-Aware Embeddings: To ensure spatiotemporal alignment between video and pose sequences, AnimaX introduces shared positional encoding. This encoding mechanism helps the model better understand and align spatial information across different modalities (RGB videos and pose maps). Modality-aware embeddings distinguish data from different modalities, ensuring the model correctly processes the differences between RGB videos and pose maps.
Multi-View Consistency: AnimaX employs multi-view attention mechanisms and camera-conditioned embeddings to ensure generated videos remain consistent across multiple viewpoints. This allows the model to learn spatial correspondences between different views and produce coherent multi-view videos.
3D Motion Reconstruction and Animation: The generated multi-view pose sequences are converted into 3D joint positions using triangulation methods. Inverse kinematics then converts the 3D joint positions into joint angles to drive 3D model animations.
Large-Scale Dataset Training: AnimaX is trained on a newly curated dataset containing 160,000 rigged sequences. The data covers diverse categories such as humans, animals, and furniture, ensuring good generalization ability of the model.

Project Links

Official Website: https://anima-x.github.io/
GitHub Repository: https://github.com/anima-x/anima-x
arXiv Paper: https://arxiv.org/pdf/2506.19851

Application Scenarios of AnimaX

Game Development: Rapidly generate animations for characters and environments, improving development efficiency and content richness.
Film Production: Used for animation films, visual effects, and virtual character animation, enhancing visual quality.
Virtual Reality (VR) and Augmented Reality (AR): Generate animations for virtual characters and dynamic environments, boosting immersion and interactive experiences.
Advertising and Marketing: Create dynamic ads and product showcase animations to attract audience attention and increase engagement.
Education and Training: Produce animations for virtual experiments and simulation training to enhance teaching and training effectiveness.