DeepSWE – An AI Agent Framework Open-Sourced by Together.ai in Collaboration with Agentica

AI Tools updated 6d ago dongdong
12 0

What is DeepSWE?

DeepSWE is an open-source AI Agent framework developed jointly by Together.ai and Agentica. Built upon the Qwen3-32B model and trained using reinforcement learning, DeepSWE has demonstrated exceptional performance in the SWE-Bench-Verified benchmark. It achieves:

  • 59.0% accuracy with Test-Time Scaling (TTS)

  • 42.2% Pass@1 accuracy without TTS

These results rank it as the top-performing open-source Agent framework. The project maintains complete transparency by open-sourcing all training data, code, training logs, and evaluation metrics, enabling developers to learn from and improve upon the Agent while advancing reinforcement learning applications in software engineering.

DeepSWE – An AI Agent Framework Open-Sourced by Together.ai in Collaboration with Agentica

Key features of DeepSWE

  1. Code Comprehension & Editing: Analyzes and modifies existing code to resolve specific software issues or implement new features

  2. Problem Resolution: Solves complex software engineering challenges through environmental interaction, including:

    • GitHub issue resolution

    • New feature implementation

    • Debugging tasks

  3. Automated Testing & Validation:

    • Executes shell commands for code compilation and testing

    • Verifies solution effectiveness

    • Ensures code modifications preserve existing functionality

  4. Multi-step Reasoning: Employs iterative reasoning and decision-making to progressively refine solutions until task completion

Technical Architecture

  • Reinforcement Learning Training:

    • Trained exclusively through RL from scratch

    • No dependency on proprietary teacher models or supervised fine-tuning (SFT)

    • Learns optimal decision-making for complex software engineering tasks via environmental interaction

  • rLLM Framework:

    • Utilizes the rLLM framework for late-stage language agent training

    • Provides efficient data management and training pipelines

    • Supports large-scale reinforcement learning

  • Sparse Reward Model:

    • Implements sparse outcome-based rewards

    • Only awards positive reinforcement when generated code patches pass all tests

    • Drives the model toward high-quality solution generation

  • Test-Time Scaling (TTS):

    • Generates multiple trajectories during testing

    • Selects successful solution paths

    • Combines strengths of execution-based and execution-free verifiers

    • Delivers significant accuracy improvements

  • Kubernetes Integration:

    • Addresses scaling challenges during training

    • Enables elastic container scheduling and auto-scaling

    • Ensures training efficiency and stability

Project Resources

Application Scenarios

  1. Code Optimization:

    • Automated code analysis and modification

    • Rapid vulnerability patching

    • Performance bottleneck optimization

    • Code structure refactoring

    • Significant code quality improvement

  2. Software Issue Resolution:

    • Efficient GitHub issue handling

    • New feature implementation

    • Complex task decomposition

    • Accelerated problem-solving in software engineering

  3. Automated Testing:

    • Test case generation

    • Code compilation and automated testing

    • Regression testing

    • Software stability assurance

    • Reduced manual testing workload

  4. Complex Problem Solving:

    • Multi-step reasoning for intricate problems

    • Iterative solution optimization

    • Knowledge accumulation through problem-solving

    • Enhanced capability for future challenges

  5. Development Assistance:

    • Real-time code suggestions

    • Intelligent code completion

    • Project management support

    • Task allocation optimization

    • Team collaboration enhancement

    • Overall development efficiency improvement

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...