Microsoft Launches SWE-bench-Live Code Evaluation Benchmark

AI Daily News updated 5m ago dongdong

124 0

The newly released SWE-bench-Live by Microsoft is a cutting-edge code repair evaluation benchmark designed to address the issues of outdated data and limited coverage in traditional benchmarks. Leveraging the agent-based intelligent framework REPOLAUNCH, SWE-bench-Live can automatically set up Docker environments and update them in real time, eliminating risks such as model overfitting and data contamination. The initial evaluation results show that existing large models exhibit a significant performance drop in dynamic environments, highlighting the importance of real-time and diverse data assessment.

© Copyright Notice

The copyright of the article belongs to the author. Please do not reprint without permission.

Related Posts

Major innovation of Microsoft Bing: Copilot Search is launched, integrating AI with traditional search.

Major innovation of Microsoft Bing: Copilot Search is launched, integrating AI with traditional search.

7m ago

01500

New Features and Integrations of Claude

New Features and Integrations of Claude

6m ago

02000

Qwen Releases New Model ParScale – 1.8B – P1

Qwen Releases New Model ParScale – 1.8B – P1

6m ago

01320

The free AI features of Edge Browser

The free AI features of Edge Browser

5m ago

01500

No comments yet...

none

No comments yet...