WorldMem – A world generation model jointly launched by Nanyang Technological University, Peking University and Shanghai AI Laboratory

AI Tools updated 2m ago dongdong
51 0

What is WorldMem

WorldMem is an innovative AI world-generation model developed by Nanyang Technological University, Peking University, and Shanghai AI Lab. By introducing a memory mechanism, it addresses the key limitation of traditional world-generation models—maintaining consistency over long temporal sequences. In WorldMem, agents can freely explore diverse environments, and the generated world maintains geometric consistency across viewpoint and positional changes. It supports temporal consistency modeling and simulates dynamic changes, such as how objects influence their surroundings. Trained extensively on the Minecraft dataset and validated in real-world scenes, WorldMem provides a novel technical pathway for building realistic, persistent, and interactive virtual worlds.

WorldMem – A world generation model jointly launched by Nanyang Technological University, Peking University and Shanghai AI Laboratory


Key Features of WorldMem

  • Maintains Consistency: Ensures coherence in the virtual world during long-term generation.

  • Simulates Dynamic Changes: Models changes over time, such as the impact of objects on the environment (e.g., light melting snow).

  • Supports Interaction: Users can place objects or perform actions in the virtual world, and those interactions affect future generations.

  • Diverse Scene Generation: Enables exploration across various virtual environments, including plains, deserts, and icy terrains.

  • Real-World Applicability: Validated on real-world datasets to demonstrate consistent generation capability.


Technical Principles Behind WorldMem

  • Conditional Generation Module: Built upon a Conditional Diffusion Transformer architecture and trained using a Diffusion Forcing strategy, it supports autoregressive long-term generation. The module is guided by external action signals (e.g., movement, viewpoint control, object placement) to generate first-person view sequences.

  • Memory Read/Write Module: A memory bank stores key historical information from the generation process. Each memory unit includes an image frame and its associated state (e.g., camera pose and timestamp). A greedy matching algorithm retrieves memory entries by computing similarity based on field-of-view overlap and time difference, efficiently identifying memory most relevant to the current scene.

  • Memory Fusion Module: Combines the current frame with memory states (pose + time) using attention mechanisms to extract contextually relevant information. Fused features then guide the generation of the current frame. Poses are represented using Plücker coordinates, while timestamps are mapped through an MLP with relative embedding mechanisms, improving spatial understanding and detail preservation.


Project Links for WorldMem


Application Scenarios for WorldMem

  • Virtual Games: Generates long-term consistent virtual game worlds that support free exploration and environmental interaction.

  • VR/AR: Builds persistent and dynamically evolving virtual environments, enhancing immersion.

  • Autonomous Driving: Simulates realistic traffic scenes for testing autonomous driving systems.

  • Architectural Design: Creates virtual architectural environments to aid in design evaluation.

  • Education: Builds interactive learning environments for student experimentation and exploration.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...