Renmin University and Microsoft Open Source Autonomous Research Framework Arbor: Structured Search Based on Hypothesis Trees Achieves State-of-the-Art on All Six Tasks
The Gaoling School of Artificial Intelligence at Renmin University of China, in collaboration with Microsoft Research, has open-sourced the autonomous research framework Arbor. It addresses the challenges AI agents face in long-term research tasks, such as difficulty accumulating experience and blind trial-and-error. Arbor's core is the Hypothesis Tree Refinement (HTR) mechanism, which organizes the research process as a continuously growing tree where each node contains a hypothesis, code version, experimental evidence, and distilled insights. The system adopts a Coordinator-Executor two-tier architecture: the Coordinator handles global strategy, maintains the hypothesis tree, and decides exploration directions; the Executor runs specific experiments in isolated environments and returns structured reports. On six real research tasks (covering model training, Harness engineering, and data synthesis), Arbor achieved the best results on held-out test sets, with an average held-out gain over 2.5 times that of Codex and Claude Code. On MLE-Bench Lite with GPT-5.5, Arbor's Any Medal score reached 86.36%, the highest to date. Ablation studies show that removing the tree structure or disabling insight feedback leads to significant performance drops, verifying the necessity of combining tree and insights. Arbor's token consumption is comparable to baseline methods (approximately 20M–43M tokens), indicating that gains come from structured search rather than increased computation. The project is open-sourced, including a standalone CLI and Agent Skill, supporting use in environments like Codex/Claude Code.
Also available in 中文.