Multi-Agent System Performance Optimization: A Comprehensive Guide from Topology to Training
Covering topology optimization, pipeline parallelism, RL training frameworks, and market mechanisms to build efficient collaborative multi-agent systems
Introduction: The Optimization Dilemma of Multi-Agent Systems
Multi-agent systems (MAS) decompose complex tasks into collaborations among multiple specialized agents, achieving performance surpassing single models in code generation, mathematical reasoning, question answering, and more. However, as system scale grows, performance optimization faces multiple challenges: workflow topologies are often fixed due to safety validation and compliance review, serial communication between agents causes latency to grow linearly with depth, existing reinforcement learning frameworks focus on single-policy optimization and cannot directly optimize multi-agent workflows, and centralized coordination mechanisms become performance bottlenecks.
This article systematically explores MAS optimization strategies from four cutting-edge directions:
These methods are not mutually exclusive and can be combined. For example, in a fixed topology scenario, MASPOB can first optimize prompts, then StreamMA can be introduced to accelerate communication; if further training is needed, UnityMAS-O can be used for RL optimization.
Prompt Optimization under Fixed Topology: MASPOB
Problem Background
In real deployments, MAS workflow topologies such as medical diagnosis SOPs and financial audit processes are often designed by experts, validated for safety, and reviewed for compliance. Once deployed, they are difficult to modify. At this point, adjusting each agent's prompt becomes a key means to improve system performance. However, prompt optimization for MAS faces three major challenges:
Core Algorithm of MASPOB
The MASPOB framework, proposed by teams including The Chinese University of Hong Kong (Shenzhen), models prompt optimization as a combinatorial black-box optimization problem with a budget, comprising three core components:
Experimental Results
On six benchmarks covering question answering (HotpotQA, DROP), code generation (HumanEval, MBPP), and mathematical reasoning (GSM8K, MATH), MASPOB achieved an average score of 80.58% under a budget of 50 evaluations, improving over IO baseline, AFlow, and MIPRO by 12.02%, 2.06%, and 1.71%, respectively. Ablation studies show that the GNN module contributes an average improvement of 2.31%, and coordinate ascent reduces runtime by over 98% with a performance loss of less than 0.5%.
Streaming Communication Acceleration: StreamMA
The Cost of Serial Communication
Existing MAS frameworks commonly use a "generate first, then transmit" serial communication method: the upstream agent must generate a complete response before passing it to the downstream. This leads to two problems:
Research shows that in long-chain reasoning, early steps are usually reliable, while later steps are more prone to drift. CoT accuracy degrades after an optimal length.
StreamMA Solution
StreamMA, proposed by teams including The Hong Kong University of Science and Technology (Guangzhou), leverages the model's own streaming output mechanism: each upstream agent forwards a reasoning step to the downstream as soon as it is produced, achieving pipeline parallelism. Core design:
Key insight: Reliable early steps reach the downstream first, allowing the downstream to build independent reasoning trajectories, diluting the impact of erroneous later steps.
Experimental Results
On eight benchmarks (AIME 2025/2026, HMMT 2026, GPQA-Diamond, HLE, LiveCodeBench) using Claude Opus 4.6 and GPT-5.4, StreamMA outperformed serial and single-model approaches across three DAG topologies, with an average improvement of 7.3 percentage points on Claude and 1.5 percentage points on GPT. Cost analysis shows that due to cache reuse, the total cost of the streaming approach is even lower than serial. Additionally, increasing the number of reasoning steps S per agent leads to continuous improvement in both effectiveness and speed, forming a new scaling law orthogonal to "stacking more agents."
Multi-Agent Reinforcement Learning Framework: UnityMAS-O
Limitations of Existing Frameworks
Most LLM-based MAS cannot be trained: workflows are patched together with prompts, routing rules, and hand-crafted interaction protocols. Even when training is introduced, it often only trains one model or role. Existing RL frameworks (TRL, OpenRLHF, verl, etc.) focus on single-policy optimization and cannot directly express role division, topology structure, and reward distribution in multi-agent workflows.
UnityMAS-O Design
UnityMAS-O, proposed by Renmin University of China and Xiaohongshu, extends verl to elevate the optimization target from "single policy" to "multi-agent workflow." Core abstractions include:
System Implementation and Training Process
The system uses a star-topology runtime: a central controller maintains the global training loop and schedules workflow states; the Ray execution layer provides remote calls and GPU management; LLM worker groups are bound to physical model instances. During training, the controller only transfers lightweight metadata (role identity, routing identifier, output, reward), while heavy tensors (token probabilities, attention masks) remain local to the worker groups.
Experimental Results
On retrieval and code tasks, all workflows and model scales showed improvement after training. Small models benefited significantly: QD-Retrieve-Answer's F1 on NQ rose from 0.022 to 0.445, and on HotpotQA from 0.032 to 0.397. In code tasks, the pass rate after training increased substantially, while the average number of validation rounds decreased, indicating that training improved both accuracy and efficiency. Parameter sharing experiments show that multi-role sharing of physical models can still be effectively trained, reducing the number of model groups in practice.
Decentralized Market Mechanism: EoM
The Drawbacks of Centralized Coordination
Mainstream MAS uses centralized orchestration (e.g., MetaGPT, AutoGen), but suffers from structural drawbacks: planning is bottlenecked at a single gate, and coordination costs grow linearly with scale. EoM, proposed by teams from Harvard University and MIT, draws inspiration from Hayek's market economy theory, designing a set of economic incentives that allow agents to spontaneously form specialization and collaboration without central control.
Core Mechanism
EoM models a group of LLM agents as a "society" with economic interactions. Each agent is defined by its wake condition, action strategy, fixed bid, and current wealth. The system includes two processes:
Experimental Results
In five domains—mathematical reasoning, accelerator design, financial research, scientific research, and distributed system optimization—EoM allowed "crippled" agents (deliberately weakened, e.g., output limited to 128 tokens, only one tool) to band together and outperform fully functional strong agents. Mathematical reasoning accuracy rose from 15.9% to 57.0%, surpassing the complete baseline of 51.9%; accelerator design EDP dropped to 39.3, better than the complete ReAct's 43.1. Ablation studies show that removing economic parameters (e.g., rent, reward) or components like auction, exploitation, and exploration significantly degrades performance, confirming that the economic mechanism is the core engine.
Conclusion and Outlook
Multi-agent system optimization is advancing from multiple dimensions:
These methods collectively point to a trend: future MAS optimization will become more systematic and automated, reducing manual intervention. For developers, understanding these techniques helps in selecting appropriate optimization strategies based on actual scenarios. For example, if the workflow is fixed but performance is insufficient, try MASPOB; if latency is a bottleneck, introduce StreamMA; if continuous improvement of system limits is needed, consider UnityMAS-O or EoM.
For a deeper understanding of basic multi-agent system concepts, refer to AI Agent and Multi-Agent; if focusing on workflow design, read Workflow and Orchestration; for reinforcement learning training, explore Fine-tuning and RL.
FAQ
Is MASPOB applicable to workflows with non-DAG topologies? MASPOB models workflows as directed acyclic graphs (DAGs), which is common for most MAS. For topologies with loops, it can theoretically be adapted by unrolling loops or introducing time steps, but the current version is primarily designed for DAGs.
What task types does StreamMA require? StreamMA is suitable for tasks that can be decomposed into steps, such as mathematical reasoning, code generation, and scientific analysis. For open-ended creative writing tasks that are difficult to stepwise, the advantages of streaming communication are less pronounced.
Which RL algorithms does UnityMAS-O support? The current version is based on verl and primarily supports the PPO algorithm. Future extensions could support GRPO, REINFORCE, etc., but the core abstractions (role-model decoupling, workflow graph, role-level rewards) are algorithm-agnostic.
How to set economic parameters in EoM? Experiments in the paper show that parameters such as rent, reward scaling, and agent count need balance. It is recommended to start with default parameters and adjust the rent multiplier and reward scaling factor based on the task to avoid premature elimination or excessive protection.
Can these methods be combined? Yes. For example, first use MASPOB to optimize prompts under fixed topology, then introduce StreamMA to accelerate communication; if further training is needed, use UnityMAS-O for RL optimization. EoM provides an alternative decentralized organization method that can complement other approaches.
Which method is best for my scenario? It depends on constraints: if the workflow is fixed and evaluation budget is limited, choose MASPOB; if latency-sensitive and tasks are decomposable, choose StreamMA; if continuous training improvement is desired, choose UnityMAS-O; if pursuing decentralization and robustness, choose EoM.
Also available in 中文.