AI ResearchMay 27, 2026
New AI Agent Benchmarks Show Rapid Progress on Real-World Tasks
WebArena and OSWorld benchmarks reveal AI agents completing over 40% of complex web navigation and desktop tasks, with dramatic improvement over just 12 months of research progress.
Also available in 中文.