DeepCoder-14B-Preview – A code generation model jointly open-sourced by Agentica and Together AI

What is DeepCoder-14B-Preview?

DeepCoder-14B-Preview is a large-scale code generation model jointly open-sourced by Agentica and Together AI, fine-tuned from Deepseek-R1-Distilled-Qwen-14B. Trained with distributed reinforcement learning (RL), DeepCoder-14B-Preview demonstrates excellent performance in code generation tasks. Notably, it achieves an accuracy rate of 60.6% on LiveCodeBench, comparable to OpenAI’s o3-mini. The model open-sources its training dataset, code, training logs, and system optimizations, aiming to advance the application of reinforcement learning (RL) in large language models (LLMs), lower the barrier for RL training, and foster community development.

The main functions of DeepCoder-14B-Preview

High-quality Code Generation: Generate high-quality, runnable code suitable for various programming languages and scenarios.
Code Problem Solving: Solve complex programming problems, including algorithm design, data structure optimization, etc.
Code Completion and Optimization: Provide code completion functionality to help developers quickly complete code writing and optimize existing code to improve efficiency.
Unit Test Generation: Automatically generate unit test code to ensure the accuracy and reliability of the generated code.
Code Debugging Assistance: Help developers locate and fix errors in the code, improving development efficiency.
Cross-platform Applicability: Support various programming environments and platforms, with wide applicability.

The Technical Principles of DeepCoder-14B-Preview

Foundation Model: Based on Deepseek-R1-Distilled-Qwen-14B, a 14-billion-parameter pre-trained model optimized through distillation, equipped with powerful language understanding and generation capabilities.
Reinforcement Learning Fine-Tuning: The foundation model is fine-tuned using distributed reinforcement learning (RL). Reinforcement learning guides the model to generate higher-quality code by leveraging a reward mechanism, ensuring both the accuracy and efficiency of the code.
High-Quality Dataset: Trained on a rigorously curated dataset consisting of 24K verifiable programming problems. Data sources include TACO Verified, PrimeIntellect’s SYNTHETIC-1 dataset, and problems submitted to LiveCodeBench.
Reward Function Design: Utilizes a Sparse Outcome Reward Model (ORM). Rewards are only given when the generated code passes all sampled unit tests, preventing the model from memorizing test cases to gain rewards.
Context Expansion Technique: Employs an iterative context expansion technique. The model starts learning with shorter context lengths and gradually generalizes to longer contexts, ultimately achieving 60.6% accuracy in a 64K context window.
System Optimization: Introduces the verl-pipeline, which leverages pipeline technology to accelerate the training process, reduce training time, and improve overall efficiency.

The project address of DeepCoder-14B-Preview

Project official website: https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder
Hugging Face Model Hub: https://huggingface.co/agentica-org/DeepCoder-14B-Preview

Application Scenarios of DeepCoder-14B-Preview

Code Generation and Automated Programming: Quickly generate high-quality code, reducing the time and effort required for manual coding and improving development efficiency. Suitable for various programming languages and frameworks, it helps developers quickly start projects.
Algorithm Competitions and Problem Solving: In algorithm competitions (such as Codeforces), assist participants in quickly understanding problems and generating efficient solutions, enhancing their competitive performance.
Code Optimization and Refactoring: Optimize and refactor existing code to improve its readability, performance, and maintainability. Help developers identify and fix potential code issues.
Education and Learning Assistance: Serve as a programming education tool to help students understand and practice programming concepts, providing code examples and solutions to support the learning of programming languages and algorithms.
Software Development and Testing: Generate unit test code to ensure software quality; assist in debugging during the development process, helping developers quickly locate and resolve issues, thereby improving the overall efficiency of software development.