SceneGen – 3D Scene Generation Framework Developed by Shanghai Jiao Tong University

AI Tools updated 9h ago dongdong
3 0

What is SceneGen?

SceneGen is an efficient open-source 3D scene generation framework developed by a research team at Shanghai Jiao Tong University. Starting from a single scene image and its corresponding object segmentation mask, SceneGen can directly generate a complete 3D scene—including geometry, texture, and spatial layout—through a single forward pass. Its innovation lies in an end-to-end generation pipeline that eliminates the need for time-consuming optimization or asset retrieval and assembly, greatly improving generation efficiency.

At its core, SceneGen integrates local and global scene aggregation modules and introduces a position prediction head that simultaneously predicts 3D assets and their relative spatial positions, ensuring both physical plausibility and visual consistency. The framework is designed for applications in VR/AR, embodied AI, game development, and interior design, offering a powerful solution for rapidly constructing realistic virtual environments.

SceneGen – 3D Scene Generation Framework Developed by Shanghai Jiao Tong University


Key Features of SceneGen

  • Single-Image-to-3D Scene Generation:
    Generates a complete 3D scene (geometry, texture, spatial layout) from a single scene image and its segmentation mask.

  • Efficient End-to-End Generation:
    Produces full 3D scenes in one forward pass without iterative optimization or asset retrieval, significantly boosting efficiency.

  • Local and Global Information Aggregation:
    Incorporates aggregation modules during feature extraction to effectively combine local details with global context, ensuring realistic and consistent scene generation.

  • Joint Asset and Position Prediction:
    Uses a unique position head to jointly predict both 3D assets and their precise spatial positions within the scene.

  • High Accuracy and Realism:
    Outperforms prior methods in geometric precision, texture fidelity, and overall visual quality on both synthetic and real-world datasets.


Technical Overview

  • Input Processing and Feature Extraction:
    Takes a single scene image and its object segmentation mask as input. Visual and geometric encoders extract object-level and global scene-level features respectively.

  • Local Texture Refinement:
    A pretrained local attention module enhances and refines object texture details to ensure visual realism.

  • Global Feature Fusion:
    A global attention (aggregation) module fuses object-level and scene-level information, capturing spatial relationships and contextual dependencies between objects.

  • Joint Decoding and Generation:
    A structure decoder processes the fused features while the position head predicts the relative spatial positions of assets, enabling simultaneous generation of geometry, texture, and layout.

  • End-to-End Optimization:
    The entire process completes in a single forward pass without iterative optimization or external asset retrieval, achieving high efficiency and realism across datasets.


Project Links


Application Scenarios

  • Game and Film Production:
    Rapidly generates production-ready 3D environments from concept art or reference photos, reducing scene modeling time—particularly valuable for indie developers and small studios.

  • Virtual and Augmented Reality (VR/AR):
    Efficiently creates realistic, interactive 3D worlds to support applications in VR/AR and embodied AI, where large-scale high-fidelity environments are essential.

  • Real Estate and Interior Design:
    Converts 2D floor plans or real-world photos into interactive 3D walkthroughs, helping developers, agents, and clients visualize spatial layouts and design aesthetics.

  • Simulation and Training Environments:
    Provides efficient scene generation for applications such as autonomous driving or robot navigation that require large quantities of realistic virtual training environments.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...