WonderPlay – A Dynamic 3D Scene Generation Framework Developed Jointly by Stanford University and University of Utah

What is WonderPlay?

WonderPlay is a novel framework jointly developed by Stanford University and the University of Utah that enables the generation of dynamic 3D scenes from a single image and user-defined actions. By integrating physical simulation and video generation technologies, it uses a physics solver to simulate a coarse 3D dynamic, which then drives a video generator to produce more realistic videos. These videos are then used to update the dynamic 3D scene, forming a closed loop between simulation and generation. WonderPlay supports a wide range of physical materials (e.g., rigid bodies, cloth, liquid, gas) and various actions (e.g., gravity, wind, point forces), allowing users to interact with the scene through simple inputs and generate rich and diverse dynamic effects.

Key Features of WonderPlay

Dynamic Scene Generation from a Single Image: Generates a dynamic 3D scene from a single image and user-defined actions, visualizing the physical consequences of those actions.
Multi-Material Support: Covers a wide array of physical materials including rigid bodies, cloth, liquid, gas, elastomers, and particles to meet diverse scene requirements.
Action Responsiveness: Supports inputs like gravity, wind, and point forces. Users can interact with the scene intuitively and generate various dynamic effects.
Visual and Physical Realism: Combines the accuracy of physical simulation with the richness of video generation to produce dynamic scenes that are both physically plausible and visually compelling.
Interactive Experience: Comes with an interactive viewer that allows users to freely explore the generated dynamic 3D scenes for an enhanced immersive experience.

Technical Principles of WonderPlay

Hybrid Generative Simulator: Integrates a physics solver and a video generator. The physics solver simulates a rough 3D dynamic, which drives the video generator to synthesize realistic videos. These videos are then used to update the dynamic 3D scene, achieving a closed loop between simulation and generation.
Dual-Modality Spatiotemporal Control: During video generation, the system uses dual-modality signals—motion (flow field) and appearance (RGB)—to control the video generator. It dynamically adjusts the generator’s responsibilities across different regions of the scene to ensure that the generated video aligns closely with physical simulation results in both dynamics and appearance.
3D Scene Reconstruction: Reconstructs both background and objects from the input image. The background is represented using Fast Layered Gaussian Surfaces (FLAGS), while the objects are built as “topological Gaussian surfaces” with connected topology, estimating their material properties as a foundation for simulation and generation.

Project Links for WonderPlay

Project Website: https://kyleleey.github.io/WonderPlay/
arXiv Technical Paper: https://arxiv.org/pdf/2505.18151

Application Scenarios for WonderPlay

AR/VR Scene Construction: Enables the creation of immersive virtual environments with dynamic user-scene interaction.
Film and Visual Effects Production: Rapidly generates dynamic scene prototypes to assist in special effects design and enhance visual impact.
Education and Professional Training: Simulates physical phenomena and work environments to strengthen practical teaching and training.
Game Development: Creates dynamic scenes and interactive effects, enhancing the realism and engagement of games.
Advertising and Marketing: Produces dynamic and interactive ad content to increase audience engagement.