researchAug 17, 2025
AI Coding Agent Breaks 70% on SWE-bench: Software Engineering Enters Semi-Automation Era
Multiple companies have achieved major breakthroughs on the SWE-bench Verified benchmark (real GitHub issue fixes): Claude 3.7 Sonnet reached 62.3%, Devin 2.0 hit 67.5%, and an unnamed startup's agent reached 71.8%. This means AI can now reliably complete over 60% of real-world software engineering tasks, marking the transition from 'AI-assisted' to 'AI-led specific tasks' in software engineering.
Also available in 中文.