Hi3DGen – A 3D Geometry Generation Framework Jointly Developed by CUHK, ByteDance, and Tsinghua University

What is Hi3DGen?

Hi3DGen is a high-fidelity 3D geometry generation framework jointly developed by researchers from The Chinese University of Hong Kong (Shenzhen), ByteDance, and Tsinghua University. It can generate high-fidelity 3D models from 2D images. By leveraging normal maps as an intermediate representation, Hi3DGen is capable of producing rich geometric details, significantly outperforming existing methods. The framework consists of three key components: an image-to-normal estimator, a normal-to-geometry learning method, and a 3D data synthesis pipeline.

The main functions of Hi3DGen

Generate high-fidelity 3D models from 2D images: Capable of converting 2D images into 3D geometric models with rich details.
Image-to-Normal Estimation: Decouples low-frequency and high-frequency image patterns through noise injection and dual-stream training, achieving generalizable, stable, and sharp normal estimation.
Normal-to-Geometry Learning: Enhances the fidelity of 3D geometry generation through normal regularization-based latent diffusion learning.
3D Data Synthesis: Constructs high-quality 3D datasets to support training.

The Technical Principle of Hi3DGen

Image to Normal Estimator: This component decouples the low-frequency and high-frequency modes of an image through noise injection and dual-stream training. The low-frequency mode is responsible for the overall shape and structure, while the high-frequency mode captures the details and textures. It can generate generalizable, stable, and sharp normal maps, providing a high-quality intermediate representation for subsequent 3D geometry generation.
Learning Methodology for Normals to Geometry: Train the latent diffusion model by leveraging the normal map as a regularization mechanism. This enhances the fidelity of 3D geometry generation, enabling the generated 3D models to retain more details.
3D Data Synthesis Pipeline: Construct high-quality 3D datasets through a 3D data synthesis pipeline for model training. This supports the model in learning the mapping relationship from 2D images to 3D geometry.
Two-Stage Generation Process: Hi3DGen adopts a two-stage generation process:

Stage 1: Basic Multi-View Generation: Fine-tune a pre-trained video diffusion model with additional camera pose conditions to convert a single-view image into a low-resolution 3D-aware sequence image (orbital video).
Stage 2: 3D-Aware Multi-View Refinement: Feed the low-resolution multi-view images generated in the first stage into a 3D-aware video-to-video refiner to further enhance the resolution and texture details.

3D Gaussian Splatting (3DGS): Learn an implicit 3D model from the generated high-resolution multi-view images and render additional interpolated views through 3DGS.
SDF-Based Reconstruction: Extract high-quality 3D meshes from the enhanced dense views using an SDF (Signed Distance Function)-based reconstruction method.

Project address of Hi3DGen

Project official website: https://stable-x.github.io/Hi3DGen/
Github repository: https://github.com/Stable-X/Hi3DGen

Application scenarios of Hi3DGen

Game Development: Quickly generate high-quality 3D game assets, such as characters, props, and scenes.
Film and Television Production: Used to create realistic 3D special effects and animations, saving time and costs compared to traditional modeling.
3D Visualization: View and analyze 3D models from different angles, suitable for fields such as architectural design and industrial design.
Virtual Photography: Generate high-quality images from different perspectives for online display and marketing.
Cultural Relics Protection: Reconstruct 3D models of cultural relics from a single photograph for digital preservation and research.
Medical Imaging: Generate 3D models from medical images (such as X-rays and CT scans) to assist in diagnosis and treatment.