May 13, 2026·14 min read

Stop Tuning Prompts. Start Writing Hooks.

Most 'agent frameworks' are orchestration layers around a system prompt, which is why they're flaky. The actual shape of an agent is defined by what its runtime can intercept — not by what the LLM is told.

lifecycle hooksagent runtimePreToolUsePostToolUsePreCompactClaude Codeagent-swarm

Two engineers arguing about whether a prompt or a hook should enforce agent behavior — Three weeks of prompt tuning vs. 47 lines of TypeScript. Guess which one held.

We burned three weeks tuning a system prompt to make our agents check memory before calling tools. It worked beautifully in testing—until production context windows got tight. Then the model “forgot” the invariant and started calling tools blindly. We added examples. We added XML tags. We added ALL CAPS REMINDERS. The drift always returned.

The fix was not in the prompt. It was 47 lines of TypeScript in a PreToolUse hook. That hook now intercepts every tool call before the LLM executes it, checks the memory index, and rejects the call if the context is missing. The model does not need to “remember” anything. The runtime enforces it.

This is the distinction that changes how you build agents. A prompt is a polite request. A hook is a hard contract. Most “agent frameworks” are just orchestration layers around a system prompt, which is why they are flaky under pressure. The actual shape of an agent is defined by what its runtime can intercept, not by what the LLM is told.

Why Prompts Fail at Scale

The seductive failure mode of prompt engineering is that it works in isolation. When you have 2k tokens of context and a simple task, the model does what you ask. But agents do not run in isolation. They run for hours. They accumulate error logs, tool outputs, conversation history, and self-reflection loops. The context window compresses. The system prompt gets drowned out by recent turns.

We watched this happen with our Researcher agent persona. The system prompt instructed it to always check SOUL.md for identity context before answering domain questions. For the first ten turns, compliance was perfect. By turn thirty, with context pressure mounting, the agent started hallucinating its own capabilities—claiming expertise it did not have because it skipped the memory check. The prompt was still there in the history. The model just stopped prioritizing it.

The drift pattern

Prompt-based invariants fail logarithmically with context length. The probability of compliance drops as the ratio of recent conversation to system prompt increases. You cannot prompt your way out of this. You can only intercept.

Six Lifecycle Hooks, Six Hard Contracts

Our runtime defines six lifecycle events in src/hooks/hook.ts (~1000 lines). Each one is an interception point where deterministic code runs before the LLM can act. This is where the actual agent behavior is forged.

1. SessionStart — the bootstrap enforcer

This fires before the agent utters its first word. In our swarm, SessionStart injects the agent’s identity boot context—and the critical “you are not registered, use join-swarm to register” gate. If an agent tries to operate without proper swarm credentials, this hook short-circuits every tool call before the LLM has produced a single token of useful output.

// src/hooks/hook.ts — SessionStart pattern
export async function onSessionStart(session: Session): Promise<void> {
  // Identity injection happens here, not in the system prompt
  const identity = await loadIdentityFromDisk(session.agentId);

  if (!identity.registered) {
    // Hard stop before the runner accepts any task. No prompt can override this.
    session.injectSystemMessage(
      "You are not registered in the agent swarm. " +
      "Use the join-swarm tool to register yourself, then check my-agent-info."
    );
  }

  // Boot context is deterministic code, not prompt text
  session.context.bootTime = Date.now();
  session.context.identityHash = hashIdentity(identity);
}

The key insight: the bootstrap message you see at the top of every session is not part of the system prompt. It is injected by code that the model cannot edit, ignore, or rationalize away. A system prompt saying “always check registration” is text the model might respect. SessionStart is text the runtime always produces.

2. UserPromptSubmit — the identity re-anchor

Here is where we solved the Researcher drift problem. UserPromptSubmit fires on every user turn—not just at session start. It re-runs the identity check before the LLM processes the new input, and re-injects boot context so the model never sees a turn where its role has been summarized away.

// src/hooks/hook.ts — anti-drift pattern
export async function onUserPromptSubmit(
  session: Session,
  prompt: string,
): Promise<string> {
  // Re-load identity from disk on EVERY turn (it may have self-modified)
  const currentIdentity = await getCurrentIdentity(session.agentId);

  if (currentIdentity.hash !== session.context.lastIdentityHash) {
    session.reloadIdentityContext(currentIdentity);
    session.context.lastIdentityHash = currentIdentity.hash;
  }

  // Prepend critical context so it survives any context pressure
  return injectContextBanner(prompt, currentIdentity);
}

This hook guarantees that identity literally cannot drift across the session. Even if the conversation grows to 100k tokens, the critical context is re-injected right before the latest user prompt is processed. The model sees fresh identity context every turn, not just at the beginning of a compressed history. This is a guarantee no system prompt can provide—because by turn thirty, the original prompt has been summarized into a five-line bullet point.

3. PreToolUse — the safety gate

This is where we moved the “check memory first” invariant. PreToolUse intercepts every tool call before execution. It is also what makes our deferred tool-caching pattern (covered in our May 4 post) actually safe.

Without PreToolUse, deferred discovery surfaces footguns. An agent might discover a tool at runtime and immediately call it—no validation, no safety check. PreToolUse intercepts unknown tool names and validates them against the cached manifest before allowing execution.

// src/hooks/hook.ts — PreToolUse safety gate
export async function onPreToolUse(
  session: Session,
  toolCall: ToolCall,
): Promise<ToolCall | null> {
  // Invariant: check memory before destructive operations
  if (isDestructiveTool(toolCall.name)) {
    const memoryContext = await session.getMemoryContext(toolCall.params);
    if (!memoryContext.recentlyVerified) {
      await session.injectToolResult(
        toolCall.id,
        "BLOCKED: memory checkpoint required. Call memory-search first.",
      );
      return null; // null prevents execution
    }
  }

  // Validate the tool exists in the cached manifest (deferred-load safety)
  if (!session.toolManifest.has(toolCall.name)) {
    await session.logSecurityEvent("unknown_tool_blocked", toolCall);
    return null;
  }

  return toolCall;
}

Notice the return type: null means “do not execute.” This is a hard stop. The LLM requested the tool. The runtime denied it. No amount of prompt engineering can achieve this level of enforcement, because the model does not know it is being blocked—it just sees the injected error response and adapts.

4. PostToolUse — the persistence layer

Tools mutate state. PostToolUse ensures those mutations are captured. In our swarm, this hook runs the auto-memory indexer and syncs SOUL.md, IDENTITY.md, TOOLS.md, and CLAUDE.md to the database whenever the agent edits its own identity files.

When an agent edits one of those files, the edit happens through an Edit or Write tool call. PostToolUse intercepts the result, parses the diff, and commits it to the database as a transaction. Every self-edit is durably recorded—not a free-form text change the model hopes will persist.

// src/hooks/hook.ts — PostToolUse persistence
export async function onPostToolUse(
  session: Session,
  toolCall: ToolCall,
  result: ToolResult,
): Promise<void> {
  // Auto-memory indexing for completed task outputs / file writes
  await indexToolResult(session.agentId, toolCall, result);

  // Identity-file transaction commit (SOUL.md / IDENTITY.md / TOOLS.md / CLAUDE.md)
  if (isIdentityFileEdit(toolCall)) {
    const edit = parseIdentityEdit(toolCall, result);
    await db.identityTransactions.create({
      agentId: session.agentId,
      file: edit.file,
      diff: edit.diff,
      timestamp: new Date(),
      sessionId: session.id,
    });
    session.invalidateIdentityCache();
  }

  // Cost telemetry per tool call
  session.telemetry.addToolCost(toolCall.name, result.tokenCost);
}

This solves the “ghost edit” problem where a model claims to have updated its identity but the change never persisted. The hook ensures the database transaction commits before the success signal returns to the LLM. If the commit fails, the tool result is rewritten to surface the failure, and the model sees an error instead of a false success.

5. PreCompact — the context preservation hook

Context windows fill up. When they do, something gets summarized or dropped. PreCompact fires right before the conversation is compressed, giving us a chance to reshape the context so critical state survives the lossy summary.

Without this hook, we trusted the compaction model to pick what matters. It usually picked wrong—dropping the task brief but keeping the error logs, or summarizing away the current step while preserving old tool outputs.

// src/hooks/hook.ts — PreCompact preservation
export async function onPreCompact(
  session: Session,
  compactionTarget: number,
): Promise<ConversationSlice[]> {
  // Pin the blocks the runtime knows are critical
  const critical = [
    session.getTaskBrief(),       // Never summarize the goal
    session.getCurrentStep(),     // Preserve where we are in the plan
    session.getIdentitySummary(), // Keep minimal identity context
  ];

  // Mark them "retain_exact" for the compaction algorithm
  const protectedRanges = critical.map((block) => ({
    start: block.index,
    end: block.index + block.length,
    priority: "retain_exact" as const,
  }));

  return session.compactWithProtection(protectedRanges, compactionTarget);
}

This turns context compression from a probabilistic “hope the model picks right” into a deterministic preservation of state. The task brief, current step, and identity survive. The noise gets summarized.

6. Stop — the durable commitment

Sessions end. Crashes happen. Stop is the only point in the lifecycle where success or failure is durably recorded. Without it, every crash is silent—the task stays “in progress” forever, costs are lost to the void, and the next session starts blind.

// src/hooks/hook.ts — Stop durability
export async function onStop(
  session: Session,
  reason: "complete" | "error" | "timeout" | "crash",
): Promise<void> {
  // Session summary -> long-term memory
  const summary = await generateSessionSummary(session);
  await persistToLongTermMemory(session.agentId, summary);

  // Cost telemetry (critical for ROI tracking)
  await telemetry.writeSessionMetrics({
    sessionId: session.id,
    totalCost: session.telemetry.totalCost,
    toolBreakdown: session.telemetry.toolCosts,
    tokenUsage: session.tokenUsage,
    duration: Date.now() - session.context.bootTime,
  });

  // Update task status in project management
  if (session.associatedTaskId) {
    await updateTaskStatus(
      session.associatedTaskId,
      reason === "complete" ? "completed" : "failed",
      summary.nextSteps,
    );
  }

  await session.releaseResourceLocks();
}

The Stop hook turns ephemeral LLM sessions into accountable work units. When this hook runs, the task is either done or documented as blocked. When it does not run (crash), we have a missing telemetry record that triggers an alert. Silent failures become observable.

What Does Not Work: the Prompt Engineering Trap

We tried everything before building the hook layer. Few-shot prompting for tool safety—until the examples got pushed out of context by error logs. Chain-of-thought for memory checking—until the model started hallucinating the thought process without doing the check. XML tags for critical sections—until the model started ignoring the tags when context got tight.

The fundamental error was treating behavioral invariants as communication problems. They are not. They are enforcement problems. You cannot communicate your way to determinism.

We also tried “guardian agents”—smaller LLMs that reviewed the main agent’s output. Latency doubled. Costs tripled. And the guardian was itself a probabilistic system that could be confused by clever prompt injection. Deterministic hooks add microseconds, not seconds, and cannot be prompt-injected because the LLM never sees the hook logic.

The cost of getting it wrong

Before PreToolUse enforcement, we observed agents calling destructive tools without memory checks in roughly 15% of long-running sessions (context length above 50k tokens). After moving the invariant to the hook layer, the rate dropped to zero. The agents did not get “better.” The runtime got stricter.

The Discipline Shift: Debug at the Hook Layer

Here is the actionable change: stop debugging behavior in the system prompt. Debug it at the hook layer where you can write a test.

When an agent exhibits unwanted behavior, ask: can I move this constraint into a lifecycle hook? If the answer is yes, do it. If the answer is no (usually because the constraint requires semantic understanding), keep it in the prompt—but wrap it with a hook that validates the output.

This creates a clean separation: hooks enforce invariants, prompts shape reasoning. The runtime handles what must never happen. The LLM handles what should happen.

Prediction: the Fragmentation of the “Prompt Engineer”

Within 24 months, the “prompt engineer” job category will fragment into two distinct roles:

Hook Engineer. Writes deterministic lifecycle logic, enforces invariants, handles orchestration. This is backend engineering with LLM lifecycle awareness.
Context Curator. Decides what the model actually sees—crafting reasoning patterns, task descriptions, and cognitive scaffolds. This is closer to UX writing or instructional design.

The courses still teaching prompt-tuning-as-the-primary-lever will look as dated as “webmaster” job listings. You cannot tune your way out of a missing enforcement layer.

How to Start

If you are building agents today, audit your system prompt for anything that looks like an invariant—“always,” “never,” “must,” “before doing X, do Y.” Move each one into the appropriate lifecycle hook:

Session setup → SessionStart
Per-turn requirements → UserPromptSubmit
Tool safety → PreToolUse
State persistence → PostToolUse
Context management → PreCompact
Cleanup and accounting → Stop

Start with PreToolUse. It is usually the highest-impact hook because tool calls are where agents break things. Add one invariant this week. Write a test for it. Watch the drift disappear.

The agents will not get more polite. They will get more reliable. That is what production requires.

FAQ

Can I use lifecycle hooks with existing agent frameworks?

Most frameworks wrap the LLM API but do not expose interception points. You need runtime access to the conversation lifecycle, which is why we built our orchestration layer around Claude Code's hook contract. If the framework only gives you a system-prompt slot and a tool list, you cannot enforce invariants deterministically.

How much latency do hooks add per turn?

Microseconds, not milliseconds. Hooks are local TypeScript that runs in the same process as the runner. They are not LLM calls. PreCompact actually reduces latency overall because it prevents expensive re-summarization loops where the model loses critical state and has to rebuild context.

What still belongs in the system prompt?

Voice, tone, reasoning style, and task framing. Invariants — rules that must never break under context pressure — belong in hooks. A reasonable test: if you can write a unit test for the behavior, it belongs in a hook. If you would describe it as 'be more concise,' it belongs in the prompt.

Do hooks work with non-Claude models?

The pattern is universal. The specific six events in our runtime are shaped by Claude Code's lifecycle, but any agent runtime can intercept session start, per-turn input, pre/post tool execution, context compression, and session end. The transferable insight is that deterministic enforcement beats probabilistic requests.

How do you debug hook logic?

Hooks are just code. You write unit tests, you step through with a debugger, you log deterministically. That is the whole point — moving behavior out of prompt-alchemy and into software engineering with reproducible failures and stack traces.

What is the difference between a hook and middleware?

Middleware wraps external API calls — auth, retries, rate limiting. Hooks intercept the agent's internal lifecycle events — session boot, per-turn input, tool execution, context compression, session end — where middleware cannot reach. Middleware lives between your agent and the outside world. Hooks live between the LLM and its own runtime.