Step1X – 3D – A 3D asset generation framework jointly open – sourced by StepFun and LightIllusions
What is Step1X-3D?
Step1X-3D is a high-fidelity and controllable 3D asset generation framework jointly developed by StepFun and LightIllusions. Leveraging a rigorous data curation pipeline, the framework filters over 5 million 3D assets to extract 2 million high-quality data samples, building a standardized dataset of geometry and texture attributes. Step1X-3D supports multimodal conditional inputs such as text and semantic labels, and enables flexible geometry control via Low-Rank Adaptation (LoRA) fine-tuning. It advances the state of the art in 3D generation technology.
Key Features of Step1X-3D
-
High-fidelity and controllable 3D asset generation:
Generates 3D assets with high-fidelity geometry and diverse texture maps, ensuring excellent alignment between surface geometry and texture mapping. -
Support for multiple conditional inputs:
Accepts various inputs such as multi-view images, bounding boxes, and skeletons for more flexible 3D asset generation. -
Open-source:
Provides access to the technical report, inference code and model weights, as well as training code.
Technical Foundations of Step1X-3D
-
Data Curation:
Utilizes multi-dimensional filtering criteria to precisely select high-quality 3D assets. By applying winding number techniques, it improves the success rate of mesh-to-SDF conversion, ensuring accurate geometric supervision. -
Geometry Generation:
Employs a perceptron-based latent encoder and sharp-edge sampling strategy to generate high-fidelity TSDF representations. A rectified flow transformer is used for efficient diffusion model training, ensuring both stability and efficiency. -
Texture Generation:
Builds upon a pretrained multi-view image generation model, enhanced with geometric guidance to produce multi-view-consistent textures. A texture-space synchronization module aligns the latent spaces of geometry and texture. Texture inpainting techniques eliminate UV mapping artifacts, enabling seamless texture synthesis. -
Controllability:
Leverages LoRA-based fine-tuning for flexible geometric control, including symmetry and level of detail adjustments. Compatible with multimodal inputs, the system enhances both control and diversity in asset generation.
Project Links for Step1X-3D
-
GitHub Repository: https://github.com/stepfun-ai/Step1X-3D
-
HuggingFace Model Hub: https://huggingface.co/stepfun-ai/Step1X-3D
-
arXiv Technical Paper: https://arxiv.org/pdf/2505.07747
-
Online Demo: https://huggingface.co/spaces/stepfun-ai/Step1X-3D
Application Scenarios for Step1X-3D
-
Game Development:
Generate high-fidelity 3D models to speed up prototyping, support personalized content, and enhance visual quality and player experience. -
Film Production:
Create virtual scenes, characters, and visual effects to accelerate production workflows and improve visual realism. -
Virtual Reality (VR) and Augmented Reality (AR):
Build immersive 3D environments and interactive content to enhance user experience. -
Architectural Design:
Generate virtual buildings and interior design models for urban planning and enhanced design presentations. -
Education and Training:
Construct virtual labs, cultural heritage reconstructions, and skill training environments to deliver interactive and intuitive learning experiences.