Lumine – a general-purpose 3D open-world AI agent developed by ByteDance
What is Lumine?
Lumine is a general-purpose AI agent launched by ByteDance, designed to perceive, reason, and act in real time within 3D open-world games. Built on the Qwen2-VL-7B-Base model, Lumine adopts a human-like interaction paradigm that integrates perception, thinking, and action. It can perceive game environments in real time and execute complex tasks including combat, puzzle-solving, NPC interaction, and GUI operations. Lumine performs exceptionally well in Genshin Impact, completing main quests that span several hours, and it also demonstrates strong cross-game generalization in other titles. Through multi-stage training, Lumine progressively enhances its autonomous reasoning and operation capabilities, offering a new direction for the development of embodied intelligence.

Key Features of Lumine
Task Execution:
Capable of autonomously completing complex, long-horizon tasks, such as finishing multi-hour main story quests in Genshin Impact.
Combat Ability:
Dynamically tracks enemies, performs precise aiming and shooting, switches characters for combo attacks, and efficiently opens chests.
Puzzle-Solving:
Handles various puzzle challenges, such as collecting Anemoculus or activating elemental monuments.
NPC Interaction:
Engages in stable conversations with NPCs and completes missions.
GUI Operations:
Operates 2D interfaces like a human player—for crafting items, using teleport waypoints, and more.
Cross-Game Generalization:
Without additional fine-tuning, it can complete tasks in other games such as Honkai: Star Rail and Wuthering Waves.
Technical Principles of Lumine
Perception Space:
Processes one frame of game footage every 200 ms while keeping historical reasoning traces to provide contextual information for decision-making.
Hybrid Reasoning Strategy:
Performs explicit reasoning during critical scenarios (e.g., environmental changes, plan failure), while directly outputting actions in simple situations to improve efficiency.
Keyboard and Mouse Operation Modeling:
All actions are defined as mouse movements and keyboard sequences, trained through a high-quality three-stage process:
-
Pre-training: Learns basic visual-motor abilities.
-
Instruction-following Training: Associates language instructions with actions.
-
Decision-making & Reasoning Training: Learns autonomous planning, correction, and executing long-horizon tasks.
Real-time Optimization:
Context management and multi-dimensional optimization reduce latency to ensure responsive interaction.
Project Links
-
Official Website: https://www.lumine-ai.org/
-
arXiv Technical Paper: https://arxiv.org/pdf/2511.08892
Application Scenarios
Game Development & Testing:
Used for automated game testing to help developers identify bugs, performance issues, and UX problems quickly. It can also assist in generating intelligent NPC behaviors and quest designs, improving development efficiency.
Gaming & Entertainment:
Acts as an AI teammate or opponent, offering more challenging and engaging gameplay. In single-player titles, it can assist players in completing complex tasks and increase game completion rates.
Education & Training:
Applied in virtual training environments to provide mission-oriented training for students or professionals. In educational games, it supports learning by helping students master knowledge and skills through tasks and challenges.
Virtual Worlds & Metaverse:
Serves as a virtual character that interacts with users, offering companionship or services. It can help generate quests, stories, and interactive content within virtual worlds.
Industry & Manufacturing:
Executes tasks and optimizes processes in industrial simulation environments, supports the design of efficient workflows, and trains intelligent robots to enhance their autonomous decision-making and operational abilities.