ModelsJun 20, 2024

Claude 3.5 Sonnet Tops SWE-bench, Becoming the Strongest Coding AI

Anthropic released Claude 3.5 Sonnet, which achieved a 49% problem-solving rate on the SWE-bench Verified benchmark, ranking first and surpassing GPT-4o and Gemini 1.5 Pro. SWE-bench is an authoritative benchmark for testing AI's ability to fix real GitHub bugs, and this result means Claude has reached the level of a junior human engineer in autonomous software development tasks.

Also available in 中文.

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

Claude 3.5 Sonnet Tops SWE-bench, Becoming the Strongest Coding AI

Documentation

Getting Started

Learn more