RoboBrain – The Embodied Brain Model Open-sourced by Yinghai Research Institute

What is RoboBrain?

RoboBrain is an open-source embodied brain model launched by the Beijing Academy of Artificial Intelligence (BAAI), aiming to advance single-agent intelligence towards collective intelligence. It consists of three modules: a base model for task planning, an A-LoRA module for operable region perception, and a T-LoRA module for trajectory prediction. RoboBrain adopts a multi-stage training strategy and possesses long-historical-frame memory and high-resolution image perception capabilities, enabling it to map abstract instructions to specific actions. It demonstrates outstanding performance in evaluation tasks such as task planning, operable region perception, and trajectory prediction.

The main functions of RoboBrain

Planning Capability: Decompose complex operational instructions into manageable subtasks. For example, break down “lift the kettle and pour water into the cup” into steps such as “approach the kettle and lift it,” “move the kettle to align the spout with the cup,” and “tilt the kettle to pour water.”
Affordance Perception: Identify and interpret the operable areas of interactive objects, such as the handle or spout of a kettle.
Trajectory Prediction: Predict the complete trajectory required to complete an operation, such as the movement path from the current position to the handle of the kettle.

The technical principles of RoboBrain

Model Architecture: RoboBrain is based on the LLaVA framework and consists of the following three main modules:
• Visual Encoder: Utilizes the SigLIP model to encode input images into visual features.
• Projector: Maps visual features to the same dimensionality as text embeddings using a two-layer MLP.
• Large Language Model (LLM): Employs the Qwen2.5-7B-Instruct model for understanding and generating text instructions.
Multi-Stage Training Strategy:RoboBrain adopts a multi-stage training strategy to enhance its performance in robotic manipulation tasks:
• Generalized Visual Training (OneVision Training): Pre-trains on large-scale general-purpose vision datasets to develop foundational visual and language understanding capabilities.
• Robotics Task Training: Fine-tunes on the ShareRobot dataset to enhance task planning, operable region perception, and trajectory prediction capabilities.
Dataset Support:RoboBrain’s training relies on the ShareRobot dataset, a high-quality heterogeneous dataset containing multi-dimensional annotations such as task planning, object operable regions, and end-effector trajectories. The diversity and accuracy of the dataset are carefully designed to support the model’s performance in complex tasks.
Inference Process:In practical applications, RoboBrain first perceives visual inputs, decomposes the input instructions into a series of executable sub-tasks, and then performs operable region perception and trajectory prediction. This step-by-step processing approach enables the model to efficiently translate abstract instructions into specific robotic operations.

The project address of RoboBrain

Project official website: https://superrobobrain.github.io/
Github repository: https://github.com/FlagOpen/RoboBrain
HuggingFace model library: https://huggingface.co/BAAI/RoboBrain
arXiv technical paper: https://arxiv.org/pdf/2502.21257

Application scenarios of RoboBrain

Multi-robot Collaboration: As the core brain model of RoboOS, a cross-ontology embodied brain-body collaboration framework, RoboBrain can achieve efficient collaboration among multiple robots of different types.
Complex Task Planning: RoboBrain can break down complex operation instructions into manageable subtasks. For example, for tasks such as “Water plants” , “Put the pot in the drawer” , and “Cluster blocks of the same color into different corners” , RoboBrain can generate detailed planning steps.
Operable Area Awareness: RoboBrain can identify and interpret the operable areas of interactive objects. For instance, in the task of “Cluster blocks of the same color into different corners”, RoboBrain can recognize the operable areas of blocks of different colors and plan a reasonable operation path.
Real-time Feedback and Optimization: Leveraging the edge-cloud collaboration capabilities of RoboOS, RoboBrain can receive execution feedback in real time, dynamically adjust strategies based on environmental changes, continuously optimize task planning, and enhance robustness.