InternVLA·N1 – An open-source end-to-end dual-system navigation large model from Shanghai AI Lab

AI Tools updated 5h ago dongdong
3 0

What is InternVLA·N1?

InternVLA·N1 is an open-source end-to-end dual-system navigation large model developed by Shanghai Artificial Intelligence Laboratory. It adopts a dual-system architecture: System 2 is responsible for understanding language instructions and planning long-range paths, while System 1 focuses on high-frequency responses and agile obstacle avoidance. The model is trained entirely on synthetic data, leveraging large-scale digital scene assets and massive multimodal corpora to achieve a cost-efficient and effective training process. On multiple mainstream benchmarks, InternVLA·N1 has demonstrated outstanding performance with internationally leading scores and strong zero-shot generalization ability. It can achieve real-world “cross-building long-distance” instruction-following navigation and agile obstacle avoidance in dense environments.

InternVLA·N1 – An open-source end-to-end dual-system navigation large model from Shanghai AI Lab


Key Features

  • Language Understanding & Path Planning: System 2 interprets natural language instructions and, based on visual observations, predicts the next target pixel in the image to achieve long-range spatial reasoning and planning.

  • Agile Obstacle Avoidance & Execution: System 1 handles high-frequency environmental responses to enable agile obstacle avoidance, ensuring accurate arrival at target locations.

  • Synthetic Data-Driven Training: Entirely trained on synthetic data, combining large-scale digital assets with vast multimodal corpora to achieve low-cost and efficient training.

  • Zero-shot Generalization: Despite being trained only on synthetic data, the model achieves 60Hz instruction-following navigation across buildings and agile obstacle avoidance in dense real-world environments, showcasing powerful generalization ability.

  • Multi-Scenario Adaptability: Achieves top scores on multiple mainstream benchmarks, making it applicable to diverse and complex scenarios.


Technical Principles of InternVLA·N1

  • Dual-System Architecture: System 2 focuses on language understanding and long-range spatial reasoning and planning, while System 1 specializes in high-frequency responses and agile obstacle avoidance. Together, they enable efficient navigation.

  • Asynchronous Inference Mechanism: System 1 and System 2 operate asynchronously—System 1 responds frequently to environmental changes for obstacle avoidance, while System 2 concentrates on long-range reasoning and planning, reducing latency and complexity.

  • Fully Synthetic Data Training: Training is entirely synthetic, leveraging large-scale digital scenes and multimodal corpora, combined with efficient data synthesis technologies, to achieve cost-effective training.

  • Two-Stage Curriculum Training: The training process includes a pre-training phase, where System 2 is supervised and fine-tuned for accurate path planning, followed by a joint-tuning phase, where System 1 and System 2 collaborate to optimize overall navigation performance.

  • Multimodal Fusion: The model fuses visual and language information through large multimodal models, enhancing its ability to understand complex environments and execute navigation tasks with high accuracy in real-world scenarios.


Project Resources


Application Scenarios of InternVLA·N1

  • Intelligent Robot Navigation: Provides service and logistics robots with efficient navigation, enabling them to follow voice commands, move autonomously, avoid obstacles, and complete tasks in complex environments.

  • Autonomous Driving Assistance: Assists vehicles with path planning and obstacle avoidance to enhance safety and reliability in autonomous driving systems.

  • Virtual & Augmented Reality: Enhances VR and AR applications by enabling natural, immersive interactions such as navigating virtual environments via voice commands.

  • Smart Security Patrols: Supports intelligent surveillance with vision-language fusion, enabling patrols and rapid response to anomalies.

  • Industrial Automation: Provides navigation and operational guidance for automated equipment, improving production efficiency and workplace safety.

  • Smart Tour Guide Services: Delivers personalized navigation and guided explanations in museums, exhibitions, and similar venues, enriching visitor experiences.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...