Your AI Workflow Has too many agents.
Six months ago every node in our content workflow was an agent. It cost $8 a run and produced different output every time. Today it costs $0.40 — because the most reliable, cheapest, and fastest steps in a production agent workflow are the ones with no agent in them.

Six months ago we shipped a content research workflow where every single node was an agent. Topic discovery? Agent. Source validation? Agent. Output formatting? Another agent. It felt right — this was autonomy. The workflow cost $8 per run, took 4–7 minutes, and produced different research briefs from identical prompts. Worse, it never failed — it just delivered subtly wrong outputs that looked correct enough to propagate downstream.
Today that same workflow costs $0.40, runs in under a minute, and produces consistent, verifiable outputs. The difference wasn’t better prompts or a smarter model. We simply stopped paying Claude to do things that don’t require judgment.
The autonomy trap
There’s a seductive failure mode in multi-agent design: the belief that more agents equals more autonomy equals better results. This is cargo-cult autonomy. Real autonomy is the system’s ability to complete its task despite variance in inputs and environment — not the density of LLM calls in its orchestration graph.
Our first research workflow had eight agent nodes arranged in a DAG. Each node was a sophisticated prompt with tool access — search, browse, summarize. The topology was correct, the prompts were iterated, the model was top-tier. And it was worse in every dimension than the system it replaced:
- Cost: $6–8 per run versus $0.15 for the previous human-in-the-loop process.
- Latency: 4–7 minutes blocking time versus 30 seconds for the human version.
- Reliability: run-to-run variance on identical inputs exceeded 40% in our evaluation set.
- Observability: no node could falsify another node’s claim of success.
The last point is subtle and fatal. In an all-agent pipeline, completion is a claim, not a fact. When Node A reports “I have validated the sources,” there’s no downstream node whose job is to check that claim — the next agent just assumes it is true. The workflow had no immune system against its own errors.
The node taxonomy that actually ships
The agent-swarm workflow engine defines five first-class step types. This isn’t post-hoc categorization — it’s enforced in the type system and the execution model:
// Workflow step taxonomy
export type StepType =
| 'deterministic' // Pure function, no LLM
| 'litmus' // Pass/fail gate with threshold
| 'validation' // Schema/shape enforcement
| 'context' // Input marshalling & transformation
| 'agent'; // LLM call with optional tools
export interface WorkflowStep {
id: string;
type: StepType;
// Agent steps carry model config, deterministic steps carry function refs
config: AgentStepConfig | DeterministicStepConfig;
}The critical insight: only agent steps are nondeterministic, expensive, and latency-heavy. The other four are code — and should outnumber agents by 10:1 or more in a mature workflow.
Here’s the actual blog-research workflow that generated this post, with node types annotated:
Node-by-node: the blog-research workflow
| Step | Type | Why this type |
|---|---|---|
| parse_brief | deterministic | Extract known fields from input JSON |
| research_topic | agent | Genuine discovery: web search, synthesis, novelty |
| validate_output_schema | validation | Hard reject if JSON does not match ResearchOutputSchema |
| check_source_quality | litmus | Pass/fail: ≥3 primary sources with domain authority |
| assemble_draft_context | deterministic | Marshal validated research into prompt context |
| generate_draft | agent | Creative composition from verified inputs |
| score_completeness | litmus | Programmatic check: all required sections present |
Agent density: 2 / 7 = 29%. Target for mature workflows: <20%.
The research step genuinely requires an agent — there’s no deterministic function that discovers novel information about an arbitrary topic. But everything else is code. The schema validator doesn’t “understand” the output; it checks that required fields exist and types match. The litmus gate doesn’t “evaluate” source quality; it counts domain authority scores above a threshold. These are not lesser versions of agent steps. They are different primitives entirely — faster, cheaper, and provably correct.
The hard-reject mechanism
The most important pattern in the taxonomy: the validation step can hard-reject an agent’s output, preventing workflow progression. This is not “asking another agent to check the work.” This is schema-level enforcement that the agent literally cannot bypass.
// Validation step definition from the workflow manifest
{
"id": "validate_output_schema",
"type": "validation",
"config": {
"schema": "ResearchOutputSchema",
"onFailure": "RETRY_WITH_ESCALATION",
"maxRetries": 2
}
}
// The schema (Zod in practice, shown for clarity)
const ResearchOutputSchema = z.object({
sources: z.array(z.object({
url: z.string().url(),
authority: z.number().min(0).max(1),
summary: z.string().min(50),
})).min(3),
key_findings: z.array(z.string()).min(1),
confidence: z.enum(['high', 'medium', 'low']),
});
// If the agent returns malformed JSON, the store-progress call fails.
// The agent cannot "complete" the step with bad output.This changes the game. In the old all-agent workflow, an agent could return malformed JSON and the next agent would try to parse it, producing garbage-in-garbage-out chains that were hard to trace. Now the failure is immediate and at the boundary. The agent must fix its output, or the workflow halts with a clear error at the validation node — not a mysterious degradation three steps downstream.
What doesn’t work: the self-grading agent
Early in our iteration, we tried having the research agent grade its own work. The prompt included: “Evaluate your sources for authority and completeness, and only return the result if quality is high.” This failed in production in ways we didn’t catch for weeks.
The agent was consistently optimistic about its own outputs. Not maliciously — it’s simply that the completion token distribution from an agent asked to grade itself is different from the distribution when an external validator applies objective criteria. We saw sources labeled “high authority” that were Medium posts. We saw “complete” research that missed obvious major sources because the agent didn’t know what it didn’t know.
The fix wasn’t a better prompt. It was removing judgment from the agent entirely and moving it to a litmus step that checks domain authority scores against a whitelist, with no room for interpretation. The litmus step can’t be optimistic — it either passes the threshold or it doesn’t.
The irreducible-judgment audit
Here’s the method we now apply to every workflow node:
The test
“Does this step require irreducible judgment, or am I paying an LLM to do something a 20-line function does deterministically?”
“Irreducible judgment” means tasks where the correct output genuinely depends on context that can’t be encoded in advance: novel research, creative synthesis, ambiguous classification, trade-off analysis. These are rare. Most workflow nodes fall into categories that almost always convert to code:
- Shape-checking: does this JSON have the expected fields and types? (→ validation step)
- Threshold gates: is the confidence score above 0.85? Are there ≥3 sources? (→ litmus step)
- Format conversion: transform Markdown to HTML, extract URLs from text (→ deterministic step)
- Finite-branch routing: if source count < 3, go to the expansion branch (→ deterministic step)
- Input marshalling: aggregate outputs from three parallel nodes into one context blob (→ context step)
When we applied this audit to our workflows, we found that roughly 60% of agent nodes could be converted to deterministic steps. The remaining 40% genuinely needed judgment — but they now ran faster because their inputs were pre-validated and their outputs were post-checked.
Why determinism upstream multiplies quality downstream
This is the counterintuitive finding: reducing the number of agents in a workflow improves the quality of the remaining agent outputs. Two mechanisms:
First, schema-validated inputs. When an agent receives data from a validation step, it receives a guarantee about shape and presence. The agent doesn’t need to handle missing fields, malformed JSON, or type mismatches — those were filtered upstream. The agent’s context window and reasoning budget go entirely toward the actual task, not defensive parsing.
Second, falsifiable completions. When an agent’s output passes through a litmus gate, the next agent knows the previous output met objective criteria. This is different from hoping the previous agent did good work. It’s a verified fact that can be relied upon, enabling deeper chains of reasoning without compounding uncertainty.
In our evaluation set, the draft-generation agent’s output quality (measured by human evaluators on a 5-point rubric) improved from 2.8 to 4.1 when we switched from unvalidated research inputs to schema-guaranteed, litmus-gated research. Same model, same prompt — the only change was input hygiene from upstream deterministic steps.
The load-bearing claim
In a production agent workflow, the default node type should be deterministic code. An agent node must justify its existence with irreducible judgment — the burden of proof runs opposite to how most teams build today.
Is this just moving complexity around?
No. Deterministic nodes are simpler than agent nodes — they’re pure functions with no model dependency, no prompt versioning, no temperature tuning, no retry logic for hallucinations. The complexity you remove from agent management vastly exceeds the complexity you add in schema definitions.
What’s the right agent density target?
For workflows we’ve seen in production, 10–20% agent nodes is the sustainable range. Below 10% and you’re probably forcing judgment into code that can’t handle it. Above 30% and you’re paying for nondeterminism where determinism would suffice. Measure it: agent density = agent steps / total steps. Track it in CI.
The compiler and the factory floor
Think of your workflow like a compiler pipeline. A modern compiler has dozens of passes: lexing, parsing, type checking, optimization, code generation. The optimization passes use heuristics — judgment about trade-offs. The rest are deterministic transformations. You don’t put a neural net at every pass; you put heuristics only where heuristics are needed, and you verify their outputs with subsequent deterministic passes.
Or think of a factory line. You don’t staff every station with a human — humans go where judgment is irreducible, machines everywhere else. An agent workflow is the same: the agent is the human-judgment station. Putting one at every step is the equivalent of paying a craftsman to tighten every bolt.
The economic frame is clarifying: agentic nodes are a scarce, expensive input; deterministic nodes are commodity. A workflow’s cost structure is set by its agent-to-deterministic ratio, and most teams ship a ratio that bankrupts them — financially in cloud spend, operationally in debugging time, and strategically in the opacity of their own systems.
The prediction: agent density becomes a metric
Within 18 months, “agent density” — the fraction of workflow nodes that are LLM calls — will be a tracked architecture metric in mature organizations, alongside latency and error rate. “An agent at every node” will join “a microservice for every function” as a recognized over-engineering anti-pattern, the mark of a team that confused activity with value.
The winning pattern is the inverse: default to deterministic code, and force every agent node to justify its existence with irreducible judgment. The burden of proof runs opposite to how most teams build today. Fix that, and you get three wins simultaneously: lower cost, lower latency, and higher quality — because your remaining agents finally have the clean inputs and verified context they need to do their actual job.
Putting it into practice
If you’re auditing an existing workflow, start here:
// Calculate agent density from a workflow manifest
const calculateDensity = (workflow: Workflow) => {
const total = workflow.steps.length;
const agents = workflow.steps.filter((s) => s.type === 'agent').length;
return {
density: agents / total,
agents,
deterministic: total - agents,
};
};
// Then walk the agent nodes with the irreducible-judgment test
const auditNode = (step: AgentStep): ConversionCandidate | null => {
const judgmentRequired = checkIrreducibleJudgment(step);
if (!judgmentRequired) {
return {
stepId: step.id,
suggestedType: inferDeterministicType(step),
rationale: step.routineDescription,
};
}
return null;
};The agent-swarm workflow engine encodes these patterns natively. Every workflow is a DAG of typed steps, and the type system enforces that deterministic, litmus, validation, and context steps run as code while agent steps run as LLM calls. The framework doesn’t prevent you from building an all-agent workflow — but it makes the cost visible, and it provides the escape hatch of deterministic alternatives.
Our research workflow now runs with 2 agent nodes and 5 deterministic nodes. The agents handle genuine discovery and creative synthesis. Everything else is code — fast, cheap, and provably correct. That’s not less autonomous. That’s autonomy that actually works.
FAQ
What is agent density in a workflow?
Agent density is the fraction of workflow nodes that are LLM calls. High density means most steps use agents; low density means most steps are deterministic code. Production systems should target low agent density — in our experience, 10 to 20 percent.
When should a workflow step use an agent vs deterministic code?
Use an agent only when the step contains irreducible judgment — tasks requiring interpretation, discovery, or context-dependent reasoning. Everything else (validation, routing, format conversion, input marshalling) should be deterministic code.
Why does schema validation improve agent output quality?
Upstream deterministic validation prevents corrupt data from reaching agent nodes. When agents receive schema-guaranteed inputs, they spend their reasoning budget on the actual task rather than defensively parsing malformed context.
What node types does agent-swarm support?
The workflow engine supports five step types: deterministic (pure code), litmus (pass/fail gates), validation (schema checking), context (input marshalling), and agent (LLM calls). Only agent steps are nondeterministic.
How do I audit an existing workflow for over-agentification?
Apply the irreducible-judgment test to each node: does this step require genuine reasoning, or could a 20-line function handle it deterministically? Convert every node that fails this test to a deterministic, litmus, validation, or context step.