OpenReasoning-Nemotron – A Series of Open-Source Reasoning Models by NVIDIA
What is OpenReasoning-Nemotron?
OpenReasoning-Nemotron is a series of open-source large language models (LLMs) released by NVIDIA, designed with strong reasoning capabilities. The models are distilled from the DeepSeek R1 0528 model and come in four parameter sizes: 1.5B, 7B, 14B, and 32B. Specializing in reasoning tasks across mathematics, science, and coding, OpenReasoning-Nemotron models are trained using large-scale data distillation and supervised fine-tuning (SFT). They have achieved state-of-the-art results on several benchmarks, and in mathematics specifically, they even surpass OpenAI’s o3 model, demonstrating exceptional reasoning performance. The models also support a “heavyweight” reasoning mode, using the GenSelect algorithm to enhance performance through multi-agent collaboration.
Key Features of OpenReasoning-Nemotron
-
High-Performance Reasoning:
Excels in tasks involving mathematics, science, and programming, generating high-quality reasoning solutions. -
Multiple Model Sizes:
Offers variants with 1.5B, 7B, 14B, and 32B parameters to suit various computational capacities and task requirements. -
Heavyweight Reasoning Mode:
Utilizes the GenSelect algorithm to combine outputs from multiple reasoning agents, significantly boosting performance in math and code tasks. -
Strong Baseline for RL Research:
Serves as a powerful baseline for future research in reinforcement learning-based reasoning, enabling the development of more efficient techniques. -
Local Deployment Support:
Fully runnable locally using tools like LM Studio, ensuring data privacy and user control.
Technical Architecture of OpenReasoning-Nemotron
-
Large-Scale Data Distillation:
Distilled from outputs of the 671B DeepSeek R1 0528 model, which generated 5 million high-quality reasoning solutions in math, science, and code. These distilled datasets enhance the reasoning capabilities of the OpenReasoning-Nemotron models. -
Supervised Fine-Tuning (SFT):
The models are trained using supervised fine-tuning rather than reinforcement learning, showcasing the power of distillation and laying a strong foundation for future RL research. -
Multi-Agent Reasoning (GenSelect):
Implements the GenSelect algorithm to initiate parallel reasoning processes and select the best output among multiple candidate solutions. -
Model Architecture:
Built upon the Qwen 2.5 architecture, the models integrate data generated by the latest R1 series to ensure high efficiency and accuracy in reasoning tasks.
Project Links
-
HuggingFace Model Hub:
https://huggingface.co/collections/nvidia/openreasoning-nemotron-687730dae0170059860f1f01
Application Scenarios for OpenReasoning-Nemotron
-
Mathematical Problem Solving:
Assists in solving complex math problems in education, research, and competitions by providing detailed step-by-step reasoning. -
Scientific Reasoning:
Offers problem-solving support for complex questions in physics, chemistry, biology, and environmental sciences. -
Code Generation and Optimization:
Automatically generates code snippets, improves code performance, and assists in debugging to enhance software development productivity. -
Multi-Agent Collaboration:
Breaks down complex tasks and uses multi-agent coordination to find optimal solutions, improving system-level performance. -
Research and Development:
Acts as a foundational model for reinforcement learning research, supporting exploration of new technologies and advanced reasoning algorithms.