AI Research
New AI Agent Benchmarks Show Rapid Progress on Real-World Tasks
WebArena and OSWorld benchmarks reveal AI agents completing over 40% of complex web navigation and desktop tasks, with dramatic improvement over just 12 months of research progress.
2026年5月27日来源:WebArena
ai-agentsbenchmarkwebarenaevaluation