Dreamer 4 – A New World Model Agent Released by DeepMind

What is Dreamer 4？

Dreamer 4 is a new type of agent developed by DeepMind, designed to solve complex control tasks by performing imagination-based training within a fast and accurate world model. In the game Minecraft, Dreamer 4 achieved the milestone of acquiring diamonds using only offline data, representing a major breakthrough in the field.

Instead of interacting online with the environment, it learns behaviors through reinforcement learning inside its world model. This approach offers greater safety and efficiency in real-world applications such as robotics, where online interactions can be risky and inefficient.

The Dreamer 4 world model is built on an efficient Transformer architecture combined with a new shortcut forcing objective, enabling real-time interactive inference on a single GPU. It can learn general action conditions from small amounts of labeled data while absorbing most of its knowledge from large-scale unlabeled video.

Key Features of Dreamer 4

Solving complex tasks through imagination training: Learns and optimizes strategies within its internal world model without requiring online interaction. In Minecraft, it successfully obtained diamonds using only offline data.
Efficient real-time interactive inference: Powered by a Transformer architecture and shortcut forcing objective, enabling real-time performance on a single GPU for practical, responsive applications.
Learning action conditions from limited data: Can learn from small amounts of labeled action data while leveraging large amounts of unlabeled video, paving the way for acquiring general world knowledge from diverse internet video.
Support for diverse tasks and generalization: Learns specific tasks but can generalize to unseen scenarios. For example, it trained only on Overworld Minecraft data but generalized to Nether and End environments it had never encountered.
Potential for general world knowledge learning: Provides a foundation for future research, such as pretraining on large-scale internet video, integrating long-term memory, language understanding, and few-shot online corrections.

Technical Principles of Dreamer 4

World model & imagination training: Builds a predictive model of environment dynamics and trains strategies entirely inside this model. This eliminates the need for risky or costly online interactions while improving efficiency.
Efficient Transformer architecture: Provides strong parallel processing and sequence modeling capabilities, enabling accurate predictions from complex video inputs and action sequences.
Shortcut Forcing Objective: A novel training objective that encourages shorter reasoning paths, reducing error accumulation during generation, improving stability and quality, and enabling fast inference.
Masked autoencoding & action conditioning: Trains the tokenizer with masked autoencoding for robust visual representations. Learns action conditions from limited labeled data and generalizes them to unlabeled videos, enabling large-scale knowledge extraction.
Multi-task learning & policy optimization: Supports multi-task training by injecting task inputs into the world model. Uses reinforcement learning within imagination training to optimize strategies, allowing better adaptation to diverse tasks and environments.

Project Links

Official Website: https://danijar.com/project/dreamer4/
arXiv Paper: https://arxiv.org/pdf/2509.24527v1

Application Scenarios of Dreamer 4

Training agents in complex game environments: Demonstrated by achieving the Minecraft diamond milestone using only offline data, showcasing its learning and decision-making abilities.
Robotics: Enables safe and efficient training of robots in simulated environments, achieving real-time inference on a single GPU without risky real-world trial-and-error.
Generalization to unseen environments: Learns from limited action data and generalizes to new, unseen tasks and environments, enabling adaptability in dynamic settings.
Learning general world knowledge: Lays the groundwork for training from diverse internet videos, potentially applicable to domains like autonomous driving or intelligent surveillance, where broad world understanding is required.
Multi-task learning & flexible strategy optimization: Can adapt strategies to diverse task demands, beneficial in contexts like smart homes or intelligent factories.