Agents Everywhere: Part 7 - Why Current Approaches Break (The Hidden Cost of Multi-Agent Systems)

Agents Everywhere: Why Current Approaches Break — Part 2 of 5

The Hidden Cost of Multi-Agent Systems

There is a pattern that plays out with remarkable consistency across engineering teams building with AI. First comes the prototype: a single agent that handles the entire task. It works, mostly, but it occasionally loses the thread on longer tasks, confuses context from one step to the next, or simply hits the limits of what a single context window can reliably hold. The obvious solution — the one that feels architecturally sound — is decomposition. Break the task into parts. Assign each part to a specialist agent. Let them collaborate.

This is the multi-agent instinct, and it is deeply intuitive. It maps to how human teams work. It suggests modularity and scale. It looks, on a whiteboard, like good engineering.

What the whiteboard doesn't show is the coordination tax.

The Appeal Is Real — and That's the Problem

The case for multi-agent systems isn't manufactured. Parallelism is genuinely valuable when tasks are truly independent. Specialisation can improve output quality when different stages require fundamentally different reasoning styles. Division of cognitive labour makes sense when a single context window genuinely cannot hold everything a complex task requires.

The problem isn't the idea. The problem is that the benefits are visible and the costs are not — at least not until the system is in production and behaving badly in ways that are extremely difficult to diagnose.

Every message that passes between agents in a pipeline is not just a data transfer. It is a token cost, a latency addition, and a new failure surface. When an agent formats its output for the next agent in the chain, that output has to be parsed, interpreted, and acted on. If there is any ambiguity — and there will be — the receiving agent makes a decision about how to handle it. That decision may or may not align with what the sending agent intended. This is coordination overhead, and it is invisible until it becomes catastrophic.

The Three Hidden Cost Categories

Compute and token compounding — Each hop through the pipeline adds tokens: the input context, the task framing, the output formatting, the inter-agent messages. In a five-agent pipeline, you may be spending three to four times the tokens you would spend on a single well-prompted agent doing the same work.
Coordination failures — Agents disagree. They contradict each other. They produce outputs the downstream agent cannot parse. They enter soft loops where each agent's output slightly reframes the task until the final output bears little resemblance to the original goal.
Debugging complexity — A failure at step seven of a ten-agent pipeline is a forensics exercise. Which agent introduced the error? Was it in the output of step four, which step seven consumed? Was it a coordination failure in the handoff at step six? The blast radius of a single agent's bad output expands through every downstream agent.

The Illusion of Parallelism

One of the most persistent misconceptions about multi-agent systems is that they are inherently parallel. In practice, most multi-agent pipelines are sequential with extra steps. The orchestrator calls Agent A, waits for the result, passes it to Agent B, waits for that result, and so on. The only difference from a single-agent loop is that each step now involves a separate model call with its own context and its own opportunity to go wrong.

The Coordination Tax: Where Time Actually Goes in Multi-Agent Systems

True parallelism requires genuinely independent tasks: tasks that don't depend on each other's outputs and can be executed concurrently. In most real workflows, this is rarer than it appears. Research, drafting, and review, for example, look like separable tasks until you realise that the drafter needs the researcher's output and the reviewer needs the drafter's output. The pipeline is sequential. The bottleneck doesn't disappear — it moves to whichever agent is slowest or most likely to fail.

The latency numbers bear this out. Measured end-to-end latency in multi-agent pipelines routinely runs two to five times longer than equivalent single-agent implementations, even when the underlying model calls are no slower. The overhead is in the coordination: the serialisation, deserialisation, context reconstruction, and error handling that happens at every boundary.

The Specialisation Trap

Specialised sub-agents feel like good software design. A research agent, a writing agent, a fact-checking agent, a formatting agent. Each does one thing well. The system is modular and composable.

What this architecture hides is that the routing logic — the system that decides which agent gets which task, with which context, in which order — becomes the most complex component of the entire system. You have offloaded complexity from the agents themselves onto the orchestration layer, and that layer is typically less transparent, harder to test, and more brittle than any individual agent.

When the routing logic fails, it fails silently. A task goes to the wrong agent. An agent receives insufficient context. A handoff drops a critical piece of state. The output looks plausible but is subtly wrong in ways that may not be caught until they cause a downstream problem.

Multi-agent coordination: where complexity multiplies and costs compound

⚠

Do Not Reach for Multi-Agent When You Cannot Justify Why a Single Agent Won't Work

If your task fits within a single context window, if your steps are sequential, and if your sub-agents would need to share significant state — you are almost certainly adding cost and fragility without adding capability. The rule of thumb is blunt: if you cannot clearly articulate why a single well-designed agent cannot do this work, you are not solving a real architectural constraint. You are solving a whiteboard problem that does not exist in production.

When Multi-Agent Actually Earns Its Cost

There are genuine use cases where multi-agent architecture is the right answer, and they are worth naming precisely — because knowing when it is right makes it easier to recognise when it isn't.

Pattern 1

Genuinely Concurrent Tasks

Tasks with no data dependency between them
Work that can be parallelised at the infrastructure level, not just the conceptual level
Pipelines where total latency is constrained by wall-clock time, not sequential logic

Best for: Large-scale data processing, multi-document analysis, batch generation workloads where true parallel execution is both achievable and measurable

Error Propagation: How Silent Failures Spread Across Agents

Pattern 2

Adversarial Review Pairs

A generator agent and a critic agent operating on the same artefact
Red/blue agent architectures where one agent's job is to find flaws in the other's output
Structured disagreement as a quality mechanism, not a bug

Best for: High-stakes content generation, code review, argument stress-testing, compliance checking where a second independent perspective adds genuine signal

Pattern 3

The Discipline of Restraint

Tasks where the full input genuinely cannot fit in a single context window
Long-document processing where chunking is unavoidable and coordination is the only option
State that must be maintained across sessions or interactions that exceed model memory limits

Best for: Very long document synthesis, multi-session research tasks, workflows operating on corpora too large for any single model call

The engineering teams that build the most reliable AI systems are not the ones that reach for multi-agent architectures first. They are the ones that exhaust the single-agent design space before introducing coordination. They prompt more carefully. They structure outputs more deliberately. They test what a single well-designed agent can actually do before assuming it cannot.

Multi-agent systems are not inherently bad. They are inherently more expensive, more fragile, and harder to debug than single-agent systems. That cost is worth paying — but only when the benefit is real, measurable, and genuinely unavailable through simpler means.

If you are building a multi-agent system right now, ask yourself one question: what would break if you replaced this with a single agent and a better prompt? If the honest answer is "not much," you have your answer.

When to Split: Single vs Multi-Agent Decision Framework

References

Microsoft Research — AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
arXiv — Mixture-of-Agents Enhances Large Language Model Capabilities
Andreessen Horowitz — The Economic Case for Generative AI and Where Value Will Accrue
IEEE — An Introduction to Multi-Agent Systems

---

Continue reading: Part 3: Workflows vs Orchestration →