What is Skywork-SWE-32B?
Skywork-SWE-32B is an open-source 32-billion-parameter autonomous code intelligent agent base model for software engineering (SWE) developed by Kunlun Wanwei. The model focuses on software engineering tasks, especially repository-level code repair capabilities, and performs excellently in complex scenarios involving multi-turn interactions and long-text processing. By constructing over 10,000 verifiable GitHub repository task instances, it has built the largest verifiable GitHub repository-level code repair dataset to date. On the SWE-bench Verified benchmark, it achieved a pass@1 accuracy of 38.0%, setting a new state-of-the-art among models of the same parameter scale. After introducing test-time scaling techniques, accuracy was further improved to 47.0%, significantly surpassing existing open-source models below 32B parameters and approaching or even exceeding the performance of some closed-source models.
Main Features of Skywork-SWE-32B
-
Repository-level Code Repair: Capable of locating code issues (such as bugs) within GitHub repositories, generating repair code, verifying the repair effects, and completing the full closed-loop from problem understanding to solution.
-
Multi-turn Interaction Capability: Supports more than 50 rounds of interaction, simulating multiple debugging and repair cycles in real development scenarios to gradually resolve issues.
-
Long Text Processing: Able to handle long texts exceeding 32k tokens, meeting the needs of complex code files and multi-file dependencies.
-
Automated Verification: Ensures the generated repair code is effective in real runtime environments by constructing dedicated runtime environments and unit test verification mechanisms.
Technical Principles of Skywork-SWE-32B
Large-scale Dataset Construction
-
Automated Data Collection and Verification: Using a three-stage automated process (data collection and pre-filtering, execution-based verification, agent trajectory generation), a dataset containing 10,169 real Python task instances covering 2,531 different GitHub repositories was constructed.
-
Runtime Environment Support: Each task instance is equipped with a dedicated Docker runtime environment image to support automated unit test verification, ensuring the validity of generated repair code in actual runtime.
-
High-quality Training Trajectories: Generated high-quality training samples for model fine-tuning by leveraging multi-turn interaction trajectories from the agent’s task-solving process.
Model Training and Optimization
-
Based on the OpenHands Framework: Utilizes the OpenHands code agent framework supporting multi-turn interaction and long-text processing, simulating real development scenarios of code repair.
-
Data Scaling Law: Systematic validation showed continuous model performance improvement as training data scale increased, verifying the applicability of the data scaling law in software engineering tasks.
-
Test-Time Scaling (TTS) Technology: During inference, increasing the number of independent rollouts (e.g., N=8) further enhances model performance, fully utilizing the model’s inference capability.
Project Links of Skywork-SWE-32B
-
HuggingFace Model Hub: https://huggingface.co/Skywork/Skywork-SWE-32B
-
Technical Paper: https://huggingface.co/Skywork/Skywork-SWE-32B/resolve/main/assets/Report.pdf
Application Scenarios of Skywork-SWE-32B
-
Code Quality Optimization: The model can analyze potential problems in code and provide optimization suggestions to help developers improve code quality and maintainability.
-
Unit Test Automation: By building dedicated runtime environments and unit test verification mechanisms, Skywork-SWE-32B can automatically execute test cases to verify the effectiveness of generated repair code.
-
Teaching Assistance: In software engineering and programming courses, Skywork-SWE-32B can serve as a teaching tool to help students understand the process of solving code issues and enhance programming skills.
-
Research Support: Provides researchers with a powerful experimental platform to explore the application of large language models in software engineering tasks and verify theories such as the data scaling law.
-
Internal Development Tools: Enterprises can integrate Skywork-SWE-32B into internal development tools to automate handling code issues, reduce manual intervention, and improve development efficiency and code quality.