The 4D-LRM – a 4D reconstruction model jointly developed by Adobe and institutions such as the University of Michigan

What is 4D-LRM?

4D-LRM (Large Space-Time Reconstruction Model) is a novel 4D reconstruction model jointly developed by Adobe Research, the University of Michigan, and other institutions. It enables fast and high-quality reconstruction of dynamic scenes from sparse input views at any given time, allowing arbitrary combinations of viewpoints and timeframes to be synthesized. Based on a Transformer architecture, the model predicts 4D Gaussian primitives per pixel to achieve a unified spatial-temporal representation. 4D-LRM demonstrates high efficiency and strong generalization capabilities across various camera configurations, particularly excelling in alternating canonical view and frame interpolation settings where it can effectively generate temporally interpolated, high-quality reconstructions.

Key Features of 4D-LRM

Efficient 4D Reconstruction:
From sparse input views and arbitrary time points, 4D-LRM can quickly and accurately reconstruct dynamic scenes across any view-time combination. It can reconstruct a 24-frame sequence in under 1.5 seconds on a single A100 GPU, showcasing its speed and scalability.
Strong Generalization:
The model generalizes well to novel objects and scenes, maintaining strong performance across various camera settings. It excels in interpolating frames under alternating canonical view setups, producing high-quality reconstructions.
Arbitrary View-Time Combinations:
Supports dynamic scene generation from any viewpoint at any time, offering new possibilities for understanding and generating complex space-time content.
Broad Applicability:
Can be extended to 4D content generation tasks, and when combined with models like SV3D, it enables the creation of higher-fidelity 4D content.

Technical Foundations of 4D-LRM

4D Gaussian Representation (4DGS):
4D-LRM represents dynamic scene elements using sets of 4D Gaussian distributions. These distributions capture both the spatial location and appearance of objects, as well as their temporal changes. Each 4D Gaussian is defined by spatial and temporal centers, spatial and temporal scales, a rotation matrix, and color parameters.
Transformer Architecture:
Built on a Transformer framework, input images are divided into patches and encoded into multi-dimensional vectors. These vectors are processed via multi-head self-attention and MLP layers to predict 4D Gaussian primitives for each pixel.
Pixel-Aligned Gaussian Rendering:
Predicted 4D Gaussians are projected onto the image plane using pixel-aligned Gaussian rendering. Alpha blending is applied to synthesize the final image from the projected primitives.
Training and Optimization:
The model is trained on large-scale datasets by minimizing the reconstruction error between predicted and ground-truth images. During training, it learns generalized space-time representations, enabling high-quality reconstruction even from sparse inputs and across unseen scenes or objects.

Project Links

Project Website: https://4dlrm.github.io/
GitHub Repository: https://github.com/Mars-tin/4D-LRM
HuggingFace Model Page: https://huggingface.co/papers/2506.18890
arXiv Paper: https://arxiv.org/pdf/2506.18890

Application Scenarios

Video Games and Film Production:
Efficient reconstruction and rendering of dynamic scenes for complex settings such as character animations and scene changes. Enhances visual effects in games and films, supporting real-time rendering and multi-view content to boost immersion.
Augmented Reality (AR) and Virtual Reality (VR):
Provides realistic and immersive environments for AR/VR applications, enabling real-time interaction where users can move freely and observe dynamic changes.
Robotics and Autonomous Driving:
Helps robots and autonomous systems better understand and predict environmental dynamics, enabling more accurate path planning and navigation.
Digital Content Creation:
Reduces manual modeling and animation workload, offering enhanced editing capabilities in video post-production.
Scientific Research:
Applicable in reconstructing and analyzing biomedical imaging data—such as heartbeats or respiratory motion—helping researchers study internal dynamic biological processes more effectively.