LHM – An open-source single-image generative animatable 3D human body model by Alibaba Tongyi

AI Tools posted 4w ago dongdong
26 0

What is LHM?

LHM (Large Animatable Human Reconstruction Model) is a model for reconstructing animatable 3D human models from a single image, developed by the Tongyi Research Lab of Alibaba. Based on a multi-modal Transformer architecture, it integrates 3D geometric features and 2D image features. By leveraging an attention mechanism, it preserves the geometric and textural details of clothing. Additionally, it introduces a head feature pyramid encoding scheme to enhance the restoration of facial details. LHM represents the reconstructed 3D models in the form of 3D Gaussian Splatting, supporting real-time rendering and pose-controlled animation. The model can generate high-quality animatable 3D human models within seconds, making it suitable for immersive applications such as AR/VR.

The main functions of LHM

  • Rapid Reconstruction: Convert a single image into an animatable 3D model within seconds, eliminating the need for complex post-processing.
  • High-Fidelity Details: Precisely preserve key information such as clothing textures and facial details, generating high-quality 3D models.
  • Real-Time Animation: Support real-time animation rendering based on pose control, suitable for immersive applications (e.g., AR/VR).
  • Strong Generalization: Excellent performance on in-the-wild images, adaptable to various scenes and poses.

The technical principle of LHM

  • Multimodal Transformer Architecture: This approach integrates 3D geometric features (surface points sampled from the SMPL-X template) and 2D image features (extracted from a pre-trained vision Transformer) within a Transformer-based framework, effectively processing both geometric and visual information. A multi-scale feature extraction scheme is specifically designed for the head region, aggregating features at different levels to enhance the restoration of facial details.
  • 3D Gaussian Point Cloud Representation: The 3D model is represented using 3D Gaussian point clouds (Gaussian Splatting), enabling real-time and high-quality rendering. The network directly predicts the parameters of the Gaussian point cloud (such as position, rotation, scale, color, etc.), facilitating fast conversion from input images to 3D models.
  • Self-Supervised Learning: The model is trained on large-scale video data and optimized using rendering loss and regularization terms, eliminating the need for scarce 3D scan data. During training, regularization terms such as “as close as possible” and “as spherical as possible” are introduced to maintain the geometric plausibility of the 3D model.
  • Real-Time Animation Support: The reconstructed 3D model is deformed into the target pose using SMPL-X skeleton parameters, supporting real-time pose-controlled animation. The entire reconstruction and animation process is completed in a single forward pass, making it suitable for real-time applications.

The project address of LHM

Application scenarios of LHM

  • Virtual Reality (VR) and Augmented Reality (AR): Quickly transform photos into animatable 3D virtual characters, enhancing immersion and interactivity.
  • Game Development: Rapidly generate high-quality 3D character models with support for real-time animation, improving development efficiency and gaming experiences.
  • Film and Television Production: Utilized in special effects and animated films to quickly create character models, enhancing production efficiency and quality.
  • Social Media and Content Creation: Users can generate 3D virtual avatars for social media, while creators can quickly produce 3D characters for short videos and more.
  • Education and Training: Create virtual teachers or teaching assistants for online education, and generate 3D models for simulation training in fields such as healthcare and military.
© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...