← Back to news
AI ResearchMay 27, 2026

New AI Agent Benchmarks Show Rapid Progress on Real-World Tasks

WebArena and OSWorld benchmarks reveal AI agents completing over 40% of complex web navigation and desktop tasks, with dramatic improvement over just 12 months of research progress.

Also available in 中文.

New AI Agent Benchmarks Show Rapid Progress on Real-World Tasks | AI Skill Navigation | AI Skill Navigation