Recursive AI Systems: Advanced Guide
AI systems that improve themselves iteratively
Recursive AI Systems: Advanced Guide
A recursive AI system feeds a model's output back into itself — to decompose a problem, refine an answer, or spawn sub-agents that spawn sub-agents. Recursion is the natural shape for problems whose structure you can't know upfront ("research this topic": you don't know the subtopics until you look). It is also the easiest way to build a system that burns $200 of tokens producing nothing. This guide covers the three patterns that work and the control machinery that keeps them safe.
Pattern 1: Recursive task decomposition
The model splits a task into subtasks; each subtask either gets solved directly or split again.
python
import jsonMAX_DEPTH = 3
def solve(task: str, depth: int = 0, budget: dict = None) -> str:
budget = budget if budget is not None else {'calls': 30}
if budget['calls'] <= 0:
return f'[budget exhausted before solving: {task[:60]}]'
budget['calls'] -= 1
# Force the leaf path at max depth — depth control lives in YOUR code, not the prompt
if depth >= MAX_DEPTH:
return llm(f'Solve directly and concisely: {task}')
plan = json.loads(llm(
f'Task: {task}\n'
'If this is directly answerable in one step, return {"atomic": true}.\n'
'Otherwise return {"atomic": false, "subtasks": ["...", "..."]} — at most 4, '
'each strictly smaller and non-overlapping. JSON only.'
))
if plan['atomic']:
return llm(f'Solve directly and concisely: {task}')
results = [solve(s, depth + 1, budget) for s in plan['subtasks']]
joined = '\n\n'.join(f'## {s}\n{r}' for s, r in zip(plan['subtasks'], results))
return llm(f'Synthesize these subtask results into one answer for: {task}\n\n{joined}')
The failure mode this code defends against: models are bad at deciding when to *stop* decomposing — left unguarded they split "write a haiku" into three subtasks. Hence the two hard limits (depth and call budget) enforced in code, plus prompt-side pressure ("strictly smaller", "at most 4"). Independent subtasks parallelize naturally — swap the list comprehension for asyncio.gather (async patterns).
Pattern 2: Recursive refinement (generate → critique → revise)
Loop a draft through critique-and-revision until it passes a bar:
python
def refine(task: str, max_rounds: int = 3) -> str:
draft = llm(f'Complete this task: {task}')
for _ in range(max_rounds):
verdict = json.loads(llm(
f'Score this against the task (1-10) and list concrete defects. '
f'JSON: {{"score": n, "defects": [...]}}\nTask: {task}\nDraft:\n{draft}'
))
if verdict['score'] >= 8 or not verdict['defects']:
break
draft = llm(f'Fix exactly these defects, change nothing else:\n'
f'{verdict["defects"]}\n\nDraft:\n{draft}')
return draft
Two things make this converge instead of oscillate: the critic must produce concrete defects (a bare score gives revisions nothing to act on), and revisions are scoped to fixing those defects only. Returns diminish fast — rounds 1–2 capture most of the gain, and a model critiquing its own output plateaus on its own blind spots (using a different model as critic measurably helps). This is the same generation/evaluation gap that Constitutional AI exploits at training time, applied at inference.
For code tasks, replace the LLM critic with ground truth: run the tests, feed failures back. Objective signal beats self-assessment every time it's available.
Pattern 3: Recursive agents (agents spawning agents)
An orchestrator delegates to sub-agents, each with its own context window — this is how agent systems scale past one context: the orchestrator holds the plan, workers hold the details, and only summaries flow up. Frameworks (CrewAI vs AutoGen, LangGraph) give you the supervisor/worker machinery. The recursion-specific rules:
budget in Pattern 1, so a runaway worker can't spend the whole allocation.The control plane (non-negotiable)
Every production recursive system needs all four, in code rather than in prompts:
depth parameter, hard cutoff to the leaf pathAnd one quality warning: recursion amplifies errors as readily as quality. A subtly wrong subtask answer gets synthesized upward as fact; a hallucinated critique steers revisions off course. Validate at the boundaries (schema-check every JSON the model returns — Zod vs Pydantic) and ground loops in objective signals (tests, retrieval, calculators) wherever one exists.
FAQ
Is this AGI-style recursive self-improvement? No — these systems improve *outputs* within fixed model weights. Nothing here changes the model itself.
When is recursion the wrong tool? When the workflow's structure is known upfront — a fixed pipeline (extract → transform → summarize) is cheaper, faster, and more debuggable than letting a model rediscover that structure per request. Recursion pays only for unknown structure.
Total cost intuition? A depth-3, branching-4 decomposition is up to 4³ = 64 leaf calls plus synthesis at every level. Always estimate the worst-case tree before shipping the loop — that's what the budget cap is for.
*Last updated: June 2026.*
Also available in 中文.