
Running agents in parallel is the easy part. The hard part is what happens after — when ten different agents hand you ten different blobs of text and you need one clean result.
This post is for anyone who's hit the wall where parallel execution felt like more trouble than it was worth: schema mismatches, broken pipelines, and output you can't merge without manual cleanup. Here's the pattern that fixed it for me.
The Problem: Simple Concatenation Always Breaks
When I first tried running 10 agents in parallel and collecting results, the naive approach was to just append outputs end-to-end. It looked fine on small tests. It fell apart immediately in production.
The failure mode is predictable: one agent returns JSON, another returns Markdown with a JSON block inside it, another returns plain text with a bullet list. Once you try to parse that stream uniformly, you get schema collision errors that kill the whole pipeline.
I ran into this specifically on an n8n 2.8.4 setup (Mac Mini cluster) when merging 10 Webhook responses simultaneously. The result was a full pipeline halt — not a graceful degradation, a complete stop. The fix wasn't retry logic or better error handling downstream. It was fixing the output contract upstream, in the Map stage.
Section 1: Why Concat Fails at Scale
The core problem is that LLM agents are not deterministic about output format unless you force them to be. Even with a prompt that says "return JSON," agents will occasionally add preamble text ("Here is the result:"), wrap JSON in a Markdown code fence, or omit required fields when they "don't apply."
At one agent, this is a nuisance. At ten agents running concurrently, the failure rate compounds. In my testing with a 20-agent cluster, the parse failure rate before fixing this was sitting at 43%. Nearly half of all parallel runs required manual intervention.
The second failure mode is subtler: even when every agent returns valid JSON, their schemas diverge. Agent 3 adds a "confidence" field that agents 1, 2, and 4 don't have. Agent 7 nests its payload one level deeper than the others. Your reducer now has to guess the shape of each document before it can merge anything.
Section 2: The Map Fix — Lock the Output Contract
The insight that moved the needle was moving the schema enforcement from a polite request in the user prompt to a hard constraint in the system prompt.
SYSTEM_PROMPT = """
You are an analysis agent.
You MUST respond using ONLY the following JSON format:
{
"agent_id": "<string>",
"status": "ok" | "error",
"payload": {}
}
No other output format is permitted. No preamble. No Markdown fences.
"""
This goes into the system parameter of every Claude API call in the Map stage — not the messages array. System-level instructions have meaningfully higher compliance rates than user-turn instructions for format constraints.
After switching to this pattern across 20 concurrent agents, the parse failure rate dropped from 43% to under 2%. That delta is the entire argument for the Map-side contract.
Here's what the Map stage looks like when you wire it up:
import anthropic
import asyncio
client = anthropic.Anthropic()
async def run_agent(agent_id: str, task: str) -> dict:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system=SYSTEM_PROMPT,
messages=[{"role": "user", "content": task}]
)
raw = response.content[0].text
try:
result = json.loads(raw)
result["agent_id"] = agent_id # enforce even if agent forgets
return result
except json.JSONDecodeError:
return {"agent_id": agent_id, "status": "error", "payload": {"raw": raw}}
async def map_stage(tasks: list[str]) -> list[dict]:
jobs = [run_agent(f"agent_{i}", task) for i, task in enumerate(tasks)]
return await asyncio.gather(*jobs)
The except block is important. Even with a tight system prompt, you'll get the occasional malformed response. Catch it, wrap it as an error result with the raw output preserved, and let the Reducer handle the accounting. Don't raise here — a single agent failure shouldn't abort the other nine.
Section 3: The Reduce Stage — Merge Without Reasoning
The Reducer has one job: merge the Map outputs into a final report. It does not reason, interpret, or make new inferences. The moment you ask a Reducer agent to also analyze the data it's merging, you've blurred the separation of concerns and made the whole system harder to debug.
Here's what a clean Reducer function looks like in plain Python:
import json
def reduce_results(agent_outputs: list[dict]) -> dict:
ok_results = [r for r in agent_outputs if r.get("status") == "ok"]
error_count = len(agent_outputs) - len(ok_results)
merged_payload = {}
for r in ok_results:
merged_payload.update(r.get("payload", {}))
return {
"total": len(agent_outputs),
"success": len(ok_results),
"errors": error_count,
"merged": merged_payload,
}
A few things worth noting here:
merged_payload.update()is intentionally last-write-wins for conflicting keys. If you need a different conflict resolution strategy (union, average, majority vote), that logic lives here — and only here.- Error results are counted but excluded from the merge. The pipeline completes even if 3 out of 10 agents fail.
- This function is pure Python. No API call required. That's intentional — if your Reducer itself calls an LLM, you've added latency and a new failure surface to the aggregation step.
If the Reducer does need an LLM call (say, to summarize merged results into prose), separate it into a second explicit step after the merge:
def summarize_merged(merged: dict) -> str:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=512,
system="You are a report summarizer. Summarize the provided data into 3 bullet points.",
messages=[{"role": "user", "content": json.dumps(merged)}]
)
return response.content[0].text
Two functions, two responsibilities. The merge is deterministic. The summarization is the only LLM call, and it only happens once.
Section 4: Real-World Timing — What the Pattern Actually Buys You
On a task running 20 analysis steps (benchmarked against Draw Things at 20 inference steps as the workload proxy):
| Execution Mode | Total Time |
|---|---|
| Single agent, sequential | 6 min 40 sec |
| 8 agents parallel + Reduce | 58 sec |
That's not a small improvement. It's the difference between aworkflow you'd run once and check an hour later versus one you'd actually use in a loop.
The gains hold because the Reducer is cheap — it's mostly Python dictionary operations, not LLM inference. The bottleneck is genuinely the parallel Map stage, and you're saturating that with concurrent API calls rather than waiting for each to finish before starting the next.
One practical gotcha on Mac-based clusters: if you're running multiple Mac Minis and routing requests through a local load balancer, watch for connection pool exhaustion. asyncio.gather() with 20 concurrent Claude API calls can hit default connection limits. Use asyncio.Semaphore to cap concurrent requests:
SEM = asyncio.Semaphore(10)
async def run_agent_guarded(agent_id: str, task: str) -> dict:
async with SEM:
return await run_agent(agent_id, task)
On Linux this is usually fine at 20 concurrent. On macOS the default ulimit for open file descriptors is lower and you'll hit it faster than you expect.
Section 5: Variations and Gotchas
n8n integration: If you're wiring this into n8n, the Map stage maps to parallel Webhook trigger branches or Split In Batches nodes. The schema enforcement still applies — add a Function node after each agent call to validate and re-shape output before it hits the Merge node.
Key collisions in payload: dict.update() silently overwrites. If agents are producing overlapping keys with different values, you need explicit collision handling:
def merge_with_conflict_log(results):
merged = {}
conflicts = {}
for r in results:
for k, v in r.get("payload", {}).items():
if k in merged and merged[k] != v:
conflicts.setdefault(k, []).append(v)
merged[k] = v
return merged, conflicts
Partial failure tolerance: The current pattern counts errors but discards them from the merge. If partial results matter (you need at least 7 of 10 agents to succeed, for example), add a quorum check before the Reduce stage completes:
QUORUM = 0.7
if len(ok_results) / len(agent_outputs) < QUORUM:
raise RuntimeError(f"Quorum not met: {len(ok_results)}/{len(agent_outputs)} succeeded")
Docker vs. bare metal: In containerized environments, each agent subprocess might not share the same filesystem context. If agents need to read shared reference files, mount a shared volume or pass the content directly in the prompt. Don't assume filesystem state is shared across containers.
Closing
Lock the schema in the Map stage. Merge — don't reason — in the Reduce stage. Those two constraints are the whole pattern. Violate either one and parallel execution stops being an optimization and becomes a debugging problem that scales with your agent count.
Before you add more agents to a pipeline, verify that you have a working output contract and a clean Reducer. Add agents after. Not before.
Next up: handling agent output versioning when the Map schema needs to evolve across deployments without breaking existing Reducers.
🐦 Faster updates on X: @baegseungh7061
📚 More in this series: Code Practical
💌 Subscribe: Follow on X or grab the RSS
댓글
댓글 쓰기