HoloTime – A panoramic 4D scene generation framework jointly launched by Peking University and Peng Cheng Laboratory
What is HoloTime?
HoloTime is a panoramic 4D scene generation framework jointly developed by the Shenzhen Graduate School of Peking University and the Peng Cheng Laboratory. It leverages video diffusion models to transform a single panoramic image into a panoramic video with realistic dynamic effects, and further reconstructs it into an immersive 4D scene.
HoloTime introduces the 360World dataset, which includes a large number of panoramic videos captured by static cameras. This dataset is used to train the Panoramic Animator to generate high-quality panoramic videos.
HoloTime also introduces Panoramic Space-Time Reconstruction technology, which uses spatiotemporal depth estimation to convert panoramic videos into 4D point clouds. These are then optimized into consistent 4D Gaussian point cloud representations, enabling immersive virtual reality experiences.
Key Features of HoloTime
-
Panoramic video generation from a single image:
Converts static panoramic images into dynamic panoramic videos containing rich motion cues such as object movement and scene changes. -
Panoramic video to 4D scene reconstruction:
Supports transforming generated panoramic videos into 4D point clouds and optimizing them into consistent 4D scene representations for virtual roaming and multi-view exploration. -
Immersive experience support:
The generated 4D scenes can provide immersive interactive experiences for VR (Virtual Reality) and AR (Augmented Reality) applications, allowing users to move freely and explore within the scene.
Technical Principles Behind HoloTime
Panoramic Animator
-
Two-stage generation strategy:
First generates a low-resolution coarse video to guide global motion; then enhances local details with a high-resolution refinement model. -
Hybrid Data Fine-tuning (HDF):
Combines panoramic video data with similar scene content from regular video datasets to bridge data distribution gaps and improve model generalization. -
Panoramic Circular Techniques (PCT):
Creates blending regions at the left and right ends of the video to ensure horizontal continuity in panoramic videos and eliminate visual seams.
Panoramic Space-Time Reconstruction
-
Uses panoramic optical flow estimation and narrow field-of-view depth estimation models to perform depth estimation on each frame of the panoramic video, ensuring spatial and temporal continuity of depth information.
-
Converts panoramic videos and their depth maps into 4D point clouds with temporal attributes, serving as the initial representation of 4D scenes.
-
Optimizes these point clouds into spatially and temporally consistent 4D scene reconstructions, supporting efficient rendering and dynamic view synthesis.
360World Dataset
-
A large-scale dataset of panoramic videos captured by static cameras, used for training the Panoramic Animator.
-
Rich in scene types and dynamic content, it enables the model to learn patterns for generating panoramic videos.
Project Links
-
Official Website: https://zhouhyocean.github.io/holotime/
-
GitHub Repository: https://github.com/PKU-YuanGroup/HoloTime
-
HuggingFace Model Hub: https://huggingface.co/Marblueocean/HoloTime
-
arXiv Technical Paper: https://arxiv.org/pdf/2504.21650
Application Scenarios of HoloTime
-
Virtual Reality (VR) and Augmented Reality (AR):
Delivers immersive 4D scenes for users to freely explore in virtual environments, enhancing the sense of presence. -
Virtual tourism and online exhibitions:
Creates panoramic 4D scenes for remote exploration of landmarks or exhibits, offering an “as if you were there” experience. -
Film production:
Quickly generates high-quality panoramic backgrounds and effects, reducing shooting costs and enhancing visual quality. -
Game development:
Builds dynamic game environments to boost immersion and visual richness. -
Architecture and urban planning:
Produces panoramic 4D scenes to help designers visually present and evaluate design proposals in advance.