DeepSWE – An AI Agent Framework Open-Sourced by Together.ai in Collaboration with Agentica

What is DeepSWE?

DeepSWE is an open-source AI Agent framework developed jointly by Together.ai and Agentica. Built upon the Qwen3-32B model and trained using reinforcement learning, DeepSWE has demonstrated exceptional performance in the SWE-Bench-Verified benchmark. It achieves:

59.0% accuracy with Test-Time Scaling (TTS)
42.2% Pass@1 accuracy without TTS

These results rank it as the top-performing open-source Agent framework. The project maintains complete transparency by open-sourcing all training data, code, training logs, and evaluation metrics, enabling developers to learn from and improve upon the Agent while advancing reinforcement learning applications in software engineering.

Key features of DeepSWE

Code Comprehension & Editing: Analyzes and modifies existing code to resolve specific software issues or implement new features
Problem Resolution: Solves complex software engineering challenges through environmental interaction, including:
- GitHub issue resolution
- New feature implementation
- Debugging tasks
Automated Testing & Validation:
- Executes shell commands for code compilation and testing
- Verifies solution effectiveness
- Ensures code modifications preserve existing functionality
Multi-step Reasoning: Employs iterative reasoning and decision-making to progressively refine solutions until task completion

Technical Architecture

Reinforcement Learning Training:
- Trained exclusively through RL from scratch
- No dependency on proprietary teacher models or supervised fine-tuning (SFT)
- Learns optimal decision-making for complex software engineering tasks via environmental interaction
rLLM Framework:
- Utilizes the rLLM framework for late-stage language agent training
- Provides efficient data management and training pipelines
- Supports large-scale reinforcement learning
Sparse Reward Model:
- Implements sparse outcome-based rewards
- Only awards positive reinforcement when generated code patches pass all tests
- Drives the model toward high-quality solution generation
Test-Time Scaling (TTS):
- Generates multiple trajectories during testing
- Selects successful solution paths
- Combines strengths of execution-based and execution-free verifiers
- Delivers significant accuracy improvements
Kubernetes Integration:
- Addresses scaling challenges during training
- Enables elastic container scheduling and auto-scaling
- Ensures training efficiency and stability

Project Resources

HuggingFace Model Hub: https://huggingface.co/agentica-org/DeepSWE-Preview

Application Scenarios

Code Optimization:
- Automated code analysis and modification
- Rapid vulnerability patching
- Performance bottleneck optimization
- Code structure refactoring
- Significant code quality improvement
Software Issue Resolution:
- Efficient GitHub issue handling
- New feature implementation
- Complex task decomposition
- Accelerated problem-solving in software engineering
Automated Testing:
- Test case generation
- Code compilation and automated testing
- Regression testing
- Software stability assurance
- Reduced manual testing workload
Complex Problem Solving:
- Multi-step reasoning for intricate problems
- Iterative solution optimization
- Knowledge accumulation through problem-solving
- Enhanced capability for future challenges
Development Assistance:
- Real-time code suggestions
- Intelligent code completion
- Project management support
- Task allocation optimization
- Team collaboration enhancement
- Overall development efficiency improvement