DeepSWE – An AI Agent Framework Open-Sourced by Together.ai in Collaboration with Agentica
What is DeepSWE?
DeepSWE is an open-source AI Agent framework developed jointly by Together.ai and Agentica. Built upon the Qwen3-32B model and trained using reinforcement learning, DeepSWE has demonstrated exceptional performance in the SWE-Bench-Verified benchmark. It achieves:
-
59.0% accuracy with Test-Time Scaling (TTS)
-
42.2% Pass@1 accuracy without TTS
These results rank it as the top-performing open-source Agent framework. The project maintains complete transparency by open-sourcing all training data, code, training logs, and evaluation metrics, enabling developers to learn from and improve upon the Agent while advancing reinforcement learning applications in software engineering.
Key features of DeepSWE
-
Code Comprehension & Editing: Analyzes and modifies existing code to resolve specific software issues or implement new features
-
Problem Resolution: Solves complex software engineering challenges through environmental interaction, including:
-
GitHub issue resolution
-
New feature implementation
-
Debugging tasks
-
-
Automated Testing & Validation:
-
Executes shell commands for code compilation and testing
-
Verifies solution effectiveness
-
Ensures code modifications preserve existing functionality
-
-
Multi-step Reasoning: Employs iterative reasoning and decision-making to progressively refine solutions until task completion
Technical Architecture
-
Reinforcement Learning Training:
-
Trained exclusively through RL from scratch
-
No dependency on proprietary teacher models or supervised fine-tuning (SFT)
-
Learns optimal decision-making for complex software engineering tasks via environmental interaction
-
-
rLLM Framework:
-
Utilizes the rLLM framework for late-stage language agent training
-
Provides efficient data management and training pipelines
-
Supports large-scale reinforcement learning
-
-
Sparse Reward Model:
-
Implements sparse outcome-based rewards
-
Only awards positive reinforcement when generated code patches pass all tests
-
Drives the model toward high-quality solution generation
-
-
Test-Time Scaling (TTS):
-
Generates multiple trajectories during testing
-
Selects successful solution paths
-
Combines strengths of execution-based and execution-free verifiers
-
Delivers significant accuracy improvements
-
-
Kubernetes Integration:
-
Addresses scaling challenges during training
-
Enables elastic container scheduling and auto-scaling
-
Ensures training efficiency and stability
-
Project Resources
-
HuggingFace Model Hub: https://huggingface.co/agentica-org/DeepSWE-Preview
Application Scenarios
-
Code Optimization:
-
Automated code analysis and modification
-
Rapid vulnerability patching
-
Performance bottleneck optimization
-
Code structure refactoring
-
Significant code quality improvement
-
-
Software Issue Resolution:
-
Efficient GitHub issue handling
-
New feature implementation
-
Complex task decomposition
-
Accelerated problem-solving in software engineering
-
-
Automated Testing:
-
Test case generation
-
Code compilation and automated testing
-
Regression testing
-
Software stability assurance
-
Reduced manual testing workload
-
-
Complex Problem Solving:
-
Multi-step reasoning for intricate problems
-
Iterative solution optimization
-
Knowledge accumulation through problem-solving
-
Enhanced capability for future challenges
-
-
Development Assistance:
-
Real-time code suggestions
-
Intelligent code completion
-
Project management support
-
Task allocation optimization
-
Team collaboration enhancement
-
Overall development efficiency improvement
-