CombatVLA – a 3D action game–specific VLA model launched by Taotian Group

AI Tools updated 23h ago dongdong
10 1

What is CombatVLA?

CombatVLA is an efficient Vision-Language-Action (VLA) model developed by the Future Life Laboratory team at Taotian Group, specifically designed for combat tasks in 3D action role-playing games (ARPGs). Built on a 3B-parameter scale, the model is trained using video-action pairs collected via motion trackers, with data formatted into Action-of-Thought (AoT) sequences. Using a three-stage progressive learning paradigm—from video-level to frame-level to truncated strategies—the model achieves highly efficient reasoning. On combat understanding benchmarks, CombatVLA outperforms existing models, delivering 50x faster inference speed and a task success rate exceeding that of human players.

CombatVLA – a 3D action game–specific VLA model launched by Taotian Group


Key Features

  • Efficient combat decision-making: Capable of making real-time combat decisions in complex 3D game environments, such as dodging attacks, casting skills, or restoring health—achieving decision speeds up to 50x faster than traditional models.

  • Combat understanding and reasoning: Evaluates enemy states, predicts enemy attack intentions, and reasons out the optimal combat actions, significantly surpassing other models in battle comprehension.

  • Action command generation: Outputs executable keyboard and mouse operation instructions (e.g., pressing specific keys or performing mouse actions) to control in-game characters.

  • Generalization ability: Demonstrates strong generalization across varying task difficulties and different games, effectively executing combat tasks even in unseen game scenarios.


Technical Principles of CombatVLA

  • Motion tracker: Collects human player operation data (keyboard and mouse inputs) synchronized with in-game visuals to generate video-action pair datasets.

  • Action-of-Thought (AoT) sequences: Converts collected data into AoT sequences, where each action is paired with detailed explanations to help the model understand the semantics and logic of actions.

  • Three-stage progressive learning:

    • Stage 1: Video-level AoT fine-tuning for initial understanding of the combat environment.

    • Stage 2: Frame-level AoT fine-tuning to ensure strict alignment between actions and preceding frames.

    • Stage 3: Frame-level truncated AoT fine-tuning, introducing a special <TRUNC> token to truncate outputs for faster inference.

  • Adaptive action-weighted loss: Combines action-alignment loss and modality-contrastive loss to optimize training and ensure accurate action output.

  • Action execution framework: Converts model-generated action instructions into actual keyboard and mouse operations to control game characters automatically.


Project Links


Application Scenarios of CombatVLA

  • 3D ARPG gameplay: Real-time control of game characters during combat, enabling efficient decision-making and action execution for enhanced gameplay.

  • Game testing and optimization: Assists developers in testing combat systems, identifying issues, and optimizing game mechanics.

  • Esports training: Provides intelligent opponents for professional players, supporting practice in combat strategies and skill improvement.

  • Game content creation: Helps developers generate combat scenarios and narratives, accelerating the construction of complex levels and missions.

  • Robotics control: Extends to real-world robotics, enabling robots to make rapid decisions and execute actions in dynamic environments.

© Copyright Notice

Related Posts

1 comment

  • Taurean Tromp

    Its like you read my mind You appear to know so much about this like you wrote the book in it or something I think that you can do with a few pics to drive the message home a little bit but instead of that this is excellent blog A fantastic read Ill certainly be back

    Reply