rStar2-Agent – Microsoft’s open-source mathematical reasoning model

What is rStar2-Agent?

rStar2-Agent is Microsoft’s open-source mathematical reasoning model with just 14 billion parameters. Trained using agent-based reinforcement learning methods, it achieved an accuracy of 80.6% on the AIME24 math reasoning benchmark, surpassing DeepSeek-R1 with 671 billion parameters. Excelling at mathematical reasoning, the model also demonstrates strong generalization in scientific reasoning and agent tool usage. Through three major innovations—efficient training infrastructure, novel algorithms, and a unique training pipeline—it achieves high performance at low computational cost, offering new insights for AI reasoning research.

Key Features of rStar2-Agent

Efficient Mathematical Reasoning: Achieves 80.6% accuracy on AIME24 with only 14B parameters, outperforming much larger models, and solving complex math problems quickly and accurately.
Scientific Reasoning: Scores 60.9% accuracy on the GPQA-Diamond benchmark, showing strong ability in understanding and reasoning over scientific knowledge.
Intelligent Tool Use: Automatically invokes appropriate tools (e.g., code execution environments) to improve problem-solving efficiency.
Strong Generalization: Performs well in specialized domains while generalizing its reasoning ability to a wide range of tasks and applications.

Technical Foundations of rStar2-Agent

Agent-Based Reinforcement Learning: The model interacts with tool environments, refining reasoning strategies via feedback and reward mechanisms to improve efficiency and accuracy.
Efficient Training Infrastructure: Uses an isolated, high-throughput code execution service built on a distributed setup of 64 AMD MI300X GPUs, supporting massive concurrent tool calls and stable, fast execution.
GRPO-RoC Algorithm: Integrates the Resample-on-Correct rollout strategy to optimize tool usage, employing asymmetric sampling to filter high-quality trajectories and reduce error rates.
Multi-Stage Reinforcement Learning Pipeline: Starts with non-reasoning fine-tuning to establish base capabilities, then applies staged reinforcement learning to gradually enhance reasoning. Training completes within one week on 64 GPUs, reaching peak performance at significantly reduced cost.

Project Resources

GitHub Repository: https://github.com/microsoft/rStar
arXiv Paper: https://www.arxiv.org/pdf/2508.20722

Application Scenarios of rStar2-Agent

Education: Provides personalized tutoring, helps students improve academically, and supports automated grading of homework and exams.
Scientific Research: Assists with analyzing complex datasets, building and optimizing scientific models, and supporting research decisions.
Finance: Enables accurate stock trend prediction, offers scientific investment recommendations, and monitors transactions in real time to prevent fraud.
Engineering: Optimizes engineering design solutions, ensures high-quality project delivery, and performs real-time fault diagnosis to enhance efficiency.
Everyday Life: Acts as an intelligent assistant, offering personalized services such as health management plans based on user data.