Microsoft Launches SWE-bench-Live Code Evaluation Benchmark

AI Daily News updated 1m ago dongdong
24 0

The newly released SWE-bench-Live by Microsoft is a cutting-edge code repair evaluation benchmark designed to address the issues of outdated data and limited coverage in traditional benchmarks. Leveraging the agent-based intelligent framework REPOLAUNCH, SWE-bench-Live can automatically set up Docker environments and update them in real time, eliminating risks such as model overfitting and data contamination. The initial evaluation results show that existing large models exhibit a significant performance drop in dynamic environments, highlighting the importance of real-time and diverse data assessment.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...