industry-news
AI Coding Agents Hit 50%+ on SWE-Bench: Autonomous Bug Fixing Arrives
Multiple AI coding systems have crossed the 50% threshold on SWE-Bench Verified, the benchmark for autonomous software engineering. Devin (Cognition AI) achieves 53.8%, Claude with Computer Use achieves 49%, and OpenAI's internal system 48.9%. SWE-Bench tests autonomous resolution of real GitHub issues—reading code, understanding context, implementing a fix, and passing tests. Industry analysts note these systems are now capable of handling 30-40% of straightforward bug fixes autonomously in real production codebases. Several companies are reporting 25-35% reduction in developer time spent on bug fixes after deploying AI coding agents.
2025年5月2日来源:SWE-Bench
AI codingSWE-Benchcoding agentsautonomous codingdeveloper tools