GigaBrain-0 – Open-source VLA Embodied Model, Based on Data Generated by World Models

What is GigaBrain-0？

GigaBrain-0 is a next-generation Vision-Language-Action (VLA) foundational model driven by data generated from world models. By generating large-scale, diverse data, it reduces reliance on real robotic data and significantly improves cross-task generalization. Using RGB-D inputs enhances spatial perception, while supervision through an Embodied Chain-of-Thought (Embodied CoT) strengthens the model’s reasoning capabilities during task execution. GigaBrain-0 demonstrates excellent performance in dexterous manipulation, long-horizon tasks, and mobile manipulation in real-world scenarios. It also shows strong generalization across variations in object appearance, placement, and camera viewpoints. For edge platforms, a lightweight version, GigaBrain-0-Small, has been released, enabling efficient operation on devices like the NVIDIA Jetson AGX Orin.

Key Features of GigaBrain-0

Data Generation and Reduced Dependence: Leverages world models to generate diverse data, including video generation, Real2Real transfer, and human imitation, reducing the need for real robotic data and improving model generalization.
RGB-D Input and Spatial Perception: Enhances spatial awareness through RGB-D input, enabling more accurate understanding of 3D object positions and layouts, improving manipulation precision.
Embodied CoT Supervision and Reasoning: Generates intermediate reasoning steps during training, such as action trajectories and subgoal planning, simulating human thought processes to enhance reasoning for complex tasks.
Task Success Rate and Generalization: Exhibits high success rates and strong generalization across tasks like folding clothes, setting tables, and transporting boxes, adapting to changes in object appearance, placement, and camera viewpoints.
Lightweight Version and Edge Deployment: GigaBrain-0-Small is designed for edge platforms like NVIDIA Jetson AGX Orin, providing efficient inference for real-world deployment.

Technical Principles of GigaBrain-0

World Model-Driven: Uses world models to generate large-scale, diverse data, reducing reliance on real robotic data and enhancing generalization.
RGB-D Input Modeling: Improves spatial perception by allowing the model to accurately sense 3D object positions and layouts.
Embodied CoT Supervision: Generates intermediate reasoning steps such as action trajectories and subgoal plans during training, simulating human thought to improve reasoning in complex tasks.
Knowledge Isolation: Employs knowledge isolation to prevent interference between action prediction and Embodied CoT generation, improving stability and performance.
Reinforcement Learning and World Model Integration: Can integrate world models as interactive environments for reinforcement learning, reducing real-world trial-and-error and improving learning efficiency.
World Model as Policy Generator: World models can learn general representations of physical dynamics and task structures, evolving into “active policy generators” that directly propose feasible action sequences or subgoals.
Closed-Loop Self-Improvement: Through a closed-loop cycle of VLA policies and world models, real-world trajectories continuously improve the world model, which in turn generates higher-quality training data, driving autonomous and lifelong learning robotic systems.

Project Links

Official Website: https://gigabrain0.github.io/
GitHub Repository: https://github.com/open-gigaai/giga-brain-0
HuggingFace Model Hub: https://huggingface.co/open-gigaai
arXiv Paper: https://arxiv.org/pdf/2510.19430

Application Scenarios of GigaBrain-0

Dexterous Manipulation Tasks: Can perform precise operations like folding clothes or preparing tissues, demonstrating strong generalization across different textures and colors.
Long-Horizon Tasks: Capable of sequentially planned tasks like clearing tables or making juice, handling complex, time-extended processes.
Mobile Manipulation Tasks: Combines global navigation and local manipulation for tasks like transporting boxes or laundry baskets, enabling seamless mobile interaction.
Edge Platform Deployment: The lightweight GigaBrain-0-Small version is tailored for edge devices like NVIDIA Jetson AGX Orin, providing efficient performance under limited resources.