Agents Everywhere: Part 3 - Why Agentic Systems Break (And What That Tells Us)

Agents Everywhere Series — Part 3 of 5

The Part We Don't Talk About Enough

There's a lot of excitement around agentic systems right now. And rightly so — they open up genuinely new ways of thinking about automation, interaction, and system design.

But if you spend any time actually working with these systems, or closely observing teams that do, you start to notice something else: things break. Often in unexpected ways. Not catastrophically — not always visibly — but enough to shape how systems are designed, deployed, and trusted.

More importantly: the nature of these failures reveals what this space is really about.

⚠

AI agents don't fail the way we expected.

They don't crash dramatically with clear error messages. They fail quietly — through small misinterpretations that compound, tool calls that succeed but return wrong data, or coordination issues between agents that produce outputs which look plausible but are subtly wrong. These are the hardest failures to detect and debug.

Architectural failure modes in agentic systems

Failure Mode 1: Interpretation Drift

An agent receives an instruction. It interprets it — reasonably well, in most cases. But not exactly as intended. That slight deviation carries forward through each subsequent step, compounds across the chain, and eventually produces something that is technically within bounds but entirely off-target.

The insidious part: the system is still functioning. No errors are raised. No exceptions are caught. The output arrives on time — it's just wrong in ways that are difficult to spot without deep understanding of what the correct output should have looked like.

Why Interpretation Drift Is Hard to Prevent

Natural language instructions are inherently ambiguous — agents must infer intent
Small deviations don't trigger alerts; they accumulate silently
Multi-step chains amplify initial misinterpretations at every hop
Without ground-truth validation at each step, drift goes undetected

Failure Mode 2: Tool Misalignment

Agents rely heavily on tools: APIs, databases, external services. Problems arise when the tool behaves differently than expected, when the agent uses it incorrectly, or when the output isn't validated properly before being passed downstream.

The classic pattern: the agent calls a tool, gets a response, assumes the response is correct, and continues. There's no verification step. The error doesn't stop the system — it propagates.

This is particularly dangerous when tools have side effects: writing to databases, sending emails, modifying records. An agent that misuses a tool with side effects can cause real-world damage before anyone realises something went wrong.

Failure Mode 3: Over-Autonomy

There's a natural temptation when building agentic systems to increase autonomy, reduce human involvement, and let the system "figure things out." But autonomy without structure tends to produce unpredictable behaviour, inefficient loops, and outcomes that no one explicitly approved.

Control vs efficiency in AI agent systems

Autonomy isn't binary — it's a spectrum. The most successful production systems today sit deliberately in the middle of that spectrum. They're autonomous enough to be useful, constrained enough to be trustworthy.

~43%

of enterprise AI failures traced to insufficient human oversight controls (Deloitte AI Survey, 2024)

–4×higher incident rate in fully autonomous vs human-in-the-loop agent setups

Failure Mode 4: Coordination Breakdowns

When multiple agents are involved, a new class of problems appears that has no equivalent in single-agent systems. Agents produce conflicting decisions. They duplicate work. They misunderstand each other's outputs. They enter deadlock states where each is waiting for the other.

The issue isn't in any single agent — it's in the interaction between them. And that interaction is notoriously hard to test, because the failures are often emergent: they only appear when agents are running together under real load, not in isolated test environments.

Failure Mode 5: Lack of Observability

This is one of the most important — and most frequently overlooked — issues in agentic systems. When something goes wrong, teams ask: what happened? Why did it happen? Where did it go wrong? And the answer is far too often: "We're not entirely sure."

⚠

Without observability, every debugging session is guesswork.

You can't fix what you can't see. And in agentic systems, what you can't see is often the decision logic: why the agent chose action A over action B, what context influenced that choice, which tool call triggered the downstream failure. Without logging these decisions as first-class data, the system is a black box that you can only understand by its outputs.

What These Failures Are Really Telling Us

Looking across these five failure modes, a clear pattern emerges. The challenge isn't intelligence — it's reliability. And reliability in agentic systems is less about better models and more about better system design.

Better models will help with some of these problems. But many of them are not model problems at all — they're design problems, architecture problems, and system-thinking problems.

Need 1

Better Control Layers

Not to restrict systems completely — but to guide behaviour, manage decision boundaries, and provide meaningful intervention points when execution deviates.

Need 2

Observability as a First-Class Feature

Not an afterthought bolted on after deployment. Every decision, every tool call, every agent interaction should be logged, queryable, and reviewable. Build this before you need it.

Need 3

Rethinking Autonomy

Instead of asking "how autonomous can we make this?", ask "where should autonomy exist — and where shouldn't it?" The answer is different for every step of your workflow.

A Familiar Pattern

We've been here before. Early distributed systems were hard to debug. Microservices required observability tooling before they became manageable. Cloud systems needed orchestration before they became reliable at scale. Agentic systems are following the same trajectory — and the solutions will look similar: coordination, visibility, and structured control.

The question isn't whether to address these failures. It's how quickly organisations will build the infrastructure to manage them systematically, rather than firefighting each incident as it appears.

The Solution: Orchestration

Part 4 explores the emerging answer to these failure modes — why orchestration is becoming the critical layer, what it looks like in practice, and why it might be the most important architectural decision you make.

Read Part 4: The Rise of Orchestration →

References

Deloitte AI Institute: State of Generative AI in the Enterprise, 2024 · deloitte.com
Anyscale: Common Failure Patterns in LLM Applications · anyscale.com
Braintrust: Evaluating LLM-Powered Systems · braintrustdata.com
Stanford HAI: Risks of Agentic AI Systems · hai.stanford.edu