InternVLA·M1 – An embodied dual-system operational large model open-sourced by Shanghai AI Lab

AI Tools updated 24h ago dongdong
14 0

What is InternVLA·M1?

InternVLA·M1 is an embodied “brain” for robotic manipulation developed by Shanghai AI Lab. It is a dual-system operational large model designed for instruction-following tasks, forming a complete closed-loop covering “reasoning → action → autonomous learning.” The model handles high-level spatial reasoning and task planning. It uses a two-stage training strategy: first, spatial perception pretraining enhances spatial reasoning and planning capabilities; second, implicit spatial reasoning during post-action training improves action efficiency. With only “spatial planning prompts,” the model can be trained efficiently, significantly reducing cost.

In public benchmarks such as SimplerEnv, InternVLA·M1 achieves internationally leading performance, demonstrating superior instruction-following and generalization to unseen objects. Leveraging the self-developed simulation platform InternData-M1 for large-scale pretraining, it is suitable for complex scenarios and long-horizon tasks.

InternVLA·M1 – An embodied dual-system operational large model open-sourced by Shanghai AI Lab


Key Features of InternVLA·M1

  • High-level spatial reasoning and task planning: Understands instructions and generates corresponding operation sequences for complex environments.

  • Dual-system operational architecture: Employs two-stage training with spatial perception pretraining followed by post-action training, enhancing reasoning and planning.

  • Efficient training and cost control: Uses spatial planning prompts for efficient training, reducing both time and costs.

  • Instruction-following and generalization: Excels in multiple public benchmarks, particularly in instruction-following and generalizing to unseen objects.

  • Autonomous learning and closed-loop control: Implements a complete reasoning-action-autonomous learning loop to continuously optimize operational strategies.

  • Complex scenario adaptability: Performs well in real-world complex scenarios and long-horizon tasks, suitable for diverse practical applications.


Technical Principles

  • Dual-system architecture: Combines spatial perception pretraining and post-action training to improve understanding and manipulation of spatial environments.

  • Spatial perception pretraining: Uses large-scale simulation data to train the model’s perception and reasoning of spatial relationships, forming the foundation for task planning.

  • Post-action training: Utilizes implicit spatial reasoning to efficiently learn precise execution of actions.

  • Spatial planning prompts: Guides the model for efficient task planning and action generation, reducing training complexity.

  • Closed-loop control: Implements a reasoning → action → autonomous learning loop, enabling the model to continuously learn and optimize in real-world tasks.

  • Large-scale simulation data: Relies on the self-developed InternData-M1 platform to generate high-quality training data for large-scale pretraining.

  • Instruction-driven: Can interpret natural language instructions and generate corresponding action sequences for instruction-following functionality.


Project Links


Application Scenarios

  • Industrial automation: Handles complex tasks on factory production lines such as assembly, material handling, and quality inspection, improving efficiency and accuracy.

  • Logistics and warehousing: Optimizes sorting, handling, and storage in logistics centers, enabling automated warehouse management.

  • Service robots: Provides cleaning, food delivery, and caregiving services in homes, hotels, hospitals, and other environments.

  • Intelligent security: Supports anomaly detection, patrolling, and inspection tasks in security systems.

  • Education and research: Acts as a teaching and research tool for robotics, AI, and automation studies.

  • Disaster rescue: Performs search, rescue, and material transport in disaster scenarios such as earthquakes or fires, reducing human risk and improving efficiency.

  • Agricultural automation: Supports crop planting, harvesting, and irrigation tasks, advancing intelligent and automated farming.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...