OpenReasoning-Nemotron – A Series of Open-Source Reasoning Models by NVIDIA

What is OpenReasoning-Nemotron？

OpenReasoning-Nemotron is a series of open-source large language models (LLMs) released by NVIDIA, designed with strong reasoning capabilities. The models are distilled from the DeepSeek R1 0528 model and come in four parameter sizes: 1.5B, 7B, 14B, and 32B. Specializing in reasoning tasks across mathematics, science, and coding, OpenReasoning-Nemotron models are trained using large-scale data distillation and supervised fine-tuning (SFT). They have achieved state-of-the-art results on several benchmarks, and in mathematics specifically, they even surpass OpenAI’s o3 model, demonstrating exceptional reasoning performance. The models also support a “heavyweight” reasoning mode, using the GenSelect algorithm to enhance performance through multi-agent collaboration.

Key Features of OpenReasoning-Nemotron

High-Performance Reasoning:
Excels in tasks involving mathematics, science, and programming, generating high-quality reasoning solutions.
Multiple Model Sizes:
Offers variants with 1.5B, 7B, 14B, and 32B parameters to suit various computational capacities and task requirements.
Heavyweight Reasoning Mode:
Utilizes the GenSelect algorithm to combine outputs from multiple reasoning agents, significantly boosting performance in math and code tasks.
Strong Baseline for RL Research:
Serves as a powerful baseline for future research in reinforcement learning-based reasoning, enabling the development of more efficient techniques.
Local Deployment Support:
Fully runnable locally using tools like LM Studio, ensuring data privacy and user control.

Technical Architecture of OpenReasoning-Nemotron

Large-Scale Data Distillation:
Distilled from outputs of the 671B DeepSeek R1 0528 model, which generated 5 million high-quality reasoning solutions in math, science, and code. These distilled datasets enhance the reasoning capabilities of the OpenReasoning-Nemotron models.
Supervised Fine-Tuning (SFT):
The models are trained using supervised fine-tuning rather than reinforcement learning, showcasing the power of distillation and laying a strong foundation for future RL research.
Multi-Agent Reasoning (GenSelect):
Implements the GenSelect algorithm to initiate parallel reasoning processes and select the best output among multiple candidate solutions.
Model Architecture:
Built upon the Qwen 2.5 architecture, the models integrate data generated by the latest R1 series to ensure high efficiency and accuracy in reasoning tasks.

Project Links

HuggingFace Model Hub:
https://huggingface.co/collections/nvidia/openreasoning-nemotron-687730dae0170059860f1f01

Application Scenarios for OpenReasoning-Nemotron

Mathematical Problem Solving:
Assists in solving complex math problems in education, research, and competitions by providing detailed step-by-step reasoning.
Scientific Reasoning:
Offers problem-solving support for complex questions in physics, chemistry, biology, and environmental sciences.
Code Generation and Optimization:
Automatically generates code snippets, improves code performance, and assists in debugging to enhance software development productivity.
Multi-Agent Collaboration:
Breaks down complex tasks and uses multi-agent coordination to find optimal solutions, improving system-level performance.
Research and Development:
Acts as a foundational model for reinforcement learning research, supporting exploration of new technologies and advanced reasoning algorithms.