← Back to news
模型Jun 20, 2024

Claude 3.5 Sonnet Tops SWE-bench, Becoming the Strongest Coding AI

Anthropic released Claude 3.5 Sonnet, which achieved a 49% problem-solving rate on the SWE-bench Verified benchmark, ranking first and surpassing GPT-4o and Gemini 1.5 Pro. SWE-bench is an authoritative benchmark for testing AI's ability to fix real GitHub bugs, and this result means Claude has reached the level of a junior human engineer in autonomous software development tasks.

Also available in 中文.

Claude 3.5 Sonnet Tops SWE-bench, Becoming the Strongest Coding AI | AI Skill Navigation | AI Skill Navigation