ResearchAug 17, 2025

AI Coding Agent Breaks 70% on SWE-bench: Software Engineering Enters Semi-Automation Era

Multiple companies have achieved major breakthroughs on the SWE-bench Verified benchmark (real GitHub issue fixes): Claude 3.7 Sonnet reached 62.3%, Devin 2.0 hit 67.5%, and an unnamed startup's agent reached 71.8%. This means AI can now reliably complete over 60% of real-world software engineering tasks, marking the transition from 'AI-assisted' to 'AI-led specific tasks' in software engineering.

Also available in 中文.

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

AI Coding Agent Breaks 70% on SWE-bench: Software Engineering Enters Semi-Automation Era

Documentation

Getting Started

Learn more