模型Jun 20, 2024
Claude 3.5 Sonnet Tops SWE-bench, Becoming the Strongest Coding AI
Anthropic released Claude 3.5 Sonnet, which achieved a 49% problem-solving rate on the SWE-bench Verified benchmark, ranking first and surpassing GPT-4o and Gemini 1.5 Pro. SWE-bench is an authoritative benchmark for testing AI's ability to fix real GitHub bugs, and this result means Claude has reached the level of a junior human engineer in autonomous software development tasks.
Also available in 中文.