January 9, 2025·14 min read

Your Agent's Memory Is a Log File, Not a Lesson: The Prescriptive Memory Problem

The epistemological flaw in agent memory systems: they record what happened instead of what to do when it happens again.

agent memoryprescriptive memorydescriptive memoryAI agentsagent orchestrationagent-swarm

Prescriptive memory architecture diagram showing how exception handler patterns differ from stack trace logging — Stack traces tell you what broke. Exception handlers tell you what to do next.

Our agent audited the same stale repository fifteen times across fifteen separate sessions. Each time, it dutifully recorded a task_completion memory: “Pulled from repo, found it was 350 commits behind, audit findings were worthless.” Fifteen indexed memories. Fifteen identical failures. The memories were perfectly accurate. The agent never learned.

Then a human wrote one line in IDENTITY.md: “ALWAYS git fetch before auditing.” The failures stopped immediately.

This is not a story about stale memory or incorrect retrieval. This is about the epistemological shape of memory itself: what it means for a memory to be useful to an agent trying to complete a task. And it reveals why your vector store full of session summaries is mostly dead weight.

The Repo-Clone-Drift Pattern: A Perfect Failure

The pattern emerged in our production system over several weeks. An agent responsible for security auditing would receive a task: “Audit the payment-service repository for dependency vulnerabilities.” The agent would:

Navigate to /workspace/repos/payment-service
Run git log --oneline -5 to orient itself
Execute the audit tooling against the working directory
Report findings with confidence

The problem: /workspace/repos/payment-service was a persistent volume, last pulled three months ago. The agent was auditing ancient history while believing it was current. Each session ended with a task_completion memory accurately describing the failure: “repo was 350 commits behind, audit findings worthless”. A new session would begin, retrieve none of those memories, and repeat the identical sequence.

We did not have a memory retrieval problem. We had a memory format problem.

Stack Traces vs. Exception Handlers

Here is the mental model that unlocked everything for us. In production systems, there are two ways to handle failure:

Descriptive: stack trace

Records what broke, when, and the state that produced it. Past-tense, specific, diagnostic.

Prescriptive: exception handler

Encodes what to do the next time similar conditions occur. Future-tense, general, operational.

When your web service throws a database timeout, the stack trace goes to your observability platform. The exception handler, a circuit breaker, a retry with backoff, or a failover to read replica, goes into your code. You do not re-read three-month-old stack traces to decide what to do. You rely on handlers written in anticipation of the failure mode.

Agent memory systems are almost entirely stack traces. Session summaries. Task completions. Tool call logs. Past-tense incident reports indexed by vector similarity. They tell you what happened. They do not tell you what to do.

Why Embeddings Do Not Bridge the Gap

You might think: “But vector similarity should surface relevant memories. A memory about stale repos should appear when the agent starts auditing a repo.”

This assumes embedding spaces are magic. They are not. Consider the actual text of a descriptive memory:

Task completed 2024-11-14. Audit of payment-service repository.
Repository state: 350 commits behind origin/main.
Impact: Audit findings referred to outdated dependencies,
vulnerabilities already patched in current HEAD were flagged
as active risks. Conclusion: audit results unreliable.

The agent's current query: “I'm about to audit the user-service repository. What preparations should I make?”

The embedding for the descriptive memory clusters with repository audits, payment-service, November tasks, dependency vulnerabilities, and git commit history. The embedding for the query clusters with preparation steps, user-service, and prescriptive guidance. The cosine similarity is weak. The memory does not surface.

Even if it did surface, the agent would need to infer: this stale payment-service repo caused bad audits; user-service is a different repo; both are repos; perhaps I should check if user-service is stale. Current agents can make this leap sometimes. But sometimes is the operant word, and always is the requirement.

The Embedding Space Argument

Incident reports and procedural rules occupy different semantic neighborhoods.

The deeper issue is architectural. Incident reports and procedural rules occupy different semantic neighborhoods entirely. A memory saying “I found the repo 350 commits behind on 2024-11-14” surfaces on queries like “what happened in November” or “tell me about payment-service issues.” It does not surface on “should I fetch before auditing” or “what are the rules for repository operations.”

The retrievability of a memory depends entirely on its encoding. Descriptive encodings retrieve on descriptive queries. Prescriptive encodings retrieve on prescriptive queries. When your agent needs to decide what to do next, it is asking a prescriptive question. Your descriptive memories are in the wrong neighborhood to answer.

The ALWAYS/BECAUSE Rewrite

Our intervention was simple in retrospect. We stopped writing task completions as incident reports. We started writing them as procedurally retrievable rules.

The prescriptive rewrite of that stale-repo memory:

ALWAYS: Before auditing any /workspace/repos/<repo>, run
git fetch && git status to verify sync with origin.

BECAUSE: Clone drift is common. On 2024-11-14, payment-service
was 350 commits behind, causing audit findings to reference
already-patched vulnerabilities and producing worthless results.

SCOPE: All repository audit tasks across all services.
SEVERITY: Critical. Silent failure, high confidence in wrong output.

This single memory now surfaces on four entirely different query vectors:

“What preparations before auditing?” matches “ALWAYS: Before auditing...”
“What are the git fetch rules?” matches “run git fetch && git status”
“Has payment-service had issues?” matches “payment-service was 350 commits behind”
“When did clone drift cause problems?” matches the specific incident in BECAUSE

The prescriptive format creates multiple retrieval vectors where the descriptive format created one. It occupies both the procedural and the historical neighborhoods. It answers both what should I do and what happened.

Production Implementation: Memory Rewriting in agent-swarm

We did not just change our prompt templates. We built prescriptive memory generation into the agent loop itself. Here is the pattern from our codebase:

interface PrescriptiveMemory {
  always: string;           // The imperative rule
  because: string;          // The grounding incident
  scope: string;            // When the rule applies
  severity: "critical" | "high" | "medium" | "low";
  sourceTaskId?: string;    // Link to originating incident
}

async function generatePrescriptiveMemory(
  taskResult: TaskCompletion,
  failurePattern: DetectedPattern
): Promise<PrescriptiveMemory> {
  // Not just storing what happened: extracting the lesson.
  const alwaysClause = await llm.extractRule(taskResult);
  const groundedAlways = await llm.groundWithIncident(
    alwaysClause,
    taskResult.originalError
  );

  return {
    always: groundedAlways.rule,
    because: groundedAlways.incidentReference,
    scope: determineScope(failurePattern),
    severity: assessSeverity(taskResult),
    sourceTaskId: taskResult.taskId
  };
}

The key insight: memory generation is not just summarization. It is rule extraction. We run a separate LLM call specifically to convert incident descriptions into imperative form, grounded in the original failure but lifted to procedural generality.

async function retrieveRelevantMemories(
  currentIntent: string,
  context: ExecutionContext
): Promise<Memory[]> {
  const queryEmbeddings = [
    // Descriptive query: what happened like this?
    await embed(currentIntent),
    // Prescriptive query: what should I do?
    await embed("ALWAYS: " + currentIntent),
    // Scope query: what rules apply here?
    await embed("SCOPE: " + context.taskType)
  ];

  // Retrieve across all query vectors, deduplicate.
  return mergeResults(await Promise.all(
    queryEmbeddings.map((q) => vectorSearch(q))
  ));
}

The dual-query retrieval is critical. We are not hoping prescriptive memories incidentally match descriptive queries. We are actively querying both neighborhoods.

Measured Outcomes: Three Failure Patterns Converted

We applied prescriptive rewriting to three recurring failure patterns in our production system:

Pattern	Descriptive Memory Rate	Prescriptive Rewrite Rate
Repo clone drift	~5 failures/30 days	~1 failure/30 days
Wide-audit context budget	~4 failures/30 days	~1 failure/30 days
Deferred-tool schema errors	~6 failures/30 days	~0-1 failures/30 days

Rates are based on observed patterns in production workloads. Notably, failures did not drop to zero: prescriptive memories still require retrieval to work, and edge cases in scope matching still occur.

The residual failures in each pattern illuminate the limits. Prescriptive memories require scope to be correctly generalized. The single remaining repo-drift failure occurred on a repository outside our standard naming convention: /workspace/repos/payment-service was covered, but /workspace/custom-audits/vendor-code fell outside the generated rule's scope clause.

What Did Not Work: The Pitfalls We Hit

Our first attempt at prescriptive memory failed completely. We simply appended “LESSON: always check repo staleness” to the bottom of existing task_completion memories. The retrieval rate did not improve.

The problem was structural. A memory that is 90% incident report and 10% lesson still embeds primarily as incident report. The lesson is semantic seasoning, not substance. The vector is pulled toward the descriptive neighborhood by the weight of text. Retrieval on procedural queries remained poor.

We also tried generating prescriptive memories only for acknowledged failures. This captured too late: agents would fail once, succeed the next time by luck, and we would never generate the rule. Successful executions taught nothing. We now generate prescriptive memories on every task completion, with severity modulating prominence, not presence.

Finally, we discovered that human-written prescriptive memories outperform LLM-generated ones for novel failure modes. The LLM generalizes too conservatively, capturing the specific incident but missing the broader pattern. Human engineers add the structural insight, such as “this applies to all repos, not just payment-service,” that makes the memory robust. Our current hybrid: LLM generates draft prescriptive memories, humans review and generalize critical ones via the IDENTITY.md interface.

Orthogonality to Memory Poisoning

This post addresses a dimension completely separate from the memory poisoning problem covered elsewhere. Memory poisoning asks: is this memory correct? Prescriptive structure asks: is this memory actionable?

You can have a perfectly accurate, non-poisoned memory store that is 90% useless because it is full of incident reports. You can have prescriptive memories that are entirely fabricated: confident rules about non-existent failure modes. The problems are orthogonal. Production systems need correct memories and prescriptive memories. Fixing one does not fix the other.

Our verification layer checks prescriptive memories against the incidents that generated them. A rule without grounding incident is flagged for review. An incident without extracted rule is flagged for prescriptive rewrite. Both pipelines run independently.

The Deeper Pattern: Logging vs. Learning

There is a fundamental confusion in how we build agent systems. We borrow infrastructure from observability: log aggregation, vector indexing, similarity search, assuming that searchable logs constitute usable memory. They do not.

Human expertise does not primarily consist of incident recall. It consists of compiled expertise: heuristics, rules of thumb, procedures that have been abstracted from specific experiences into general guidance. When a senior engineer says “always check the circuit breaker first,” they are not searching their memory of past outages. They are retrieving a prescriptive rule compiled from dozens of specific incidents over years.

Agent memory systems need the same compilation layer. Raw logs go to the observability stack. Compiled, prescriptive rules go to the agent's working memory. The agent does not need to know that payment-service was stale on November 14. It needs to know that repository staleness is a common failure mode with a specific preventive action.

Implementation Checklist

If you are building agent memory systems today:

Audit your current memories. What percentage are past-tense incident reports versus future-tense procedures?
For your top three recurring failure patterns, manually rewrite descriptive memories into ALWAYS/BECAUSE format.
Measure retrieval: can the prescriptive version be found by queries that do not mention the original incident?
Add prescriptive query embeddings to your retrieval pipeline. Do not rely on incidental overlap.
Build human-in-the-loop review for novel failure modes. LLMs under-generalize.

The goal is not perfect memory. The goal is memory that changes behavior. Stack traces make great diagnostic tools. They make terrible exception handlers.

FAQ

What's the difference between prescriptive and descriptive memory?

Descriptive memory records what happened: incident reports. Prescriptive memory encodes what to do when similar situations occur: procedural rules. Most agent systems store the former but need the latter.

Why don't incident reports help agents avoid repeat mistakes?

Embedding spaces for past-tense incidents and future-tense procedures do not overlap reliably. A memory about finding a stale repo on May 10 will not necessarily surface when querying whether to fetch before auditing.

How do I convert descriptive memories to prescriptive ones?

Rewrite them with an ALWAYS/BECAUSE structure: ALWAYS run git fetch before auditing. BECAUSE clone drift is common and has made audit findings worthless before.

Isn't this just better prompting?

No. Prompting puts static rules in system prompts. Prescriptive memory is dynamic: rules emerge from failures, accumulate, retrieve contextually, and survive beyond individual sessions.

Does this replace memory poisoning fixes?

No. Memory poisoning fixes correctness: is this memory true? Prescriptive memory fixes actionability: does this memory change behavior? Production systems need both.

/ keep reading

All posts

December 19, 2024 / 13 min read

Your Agent's Memory Is a Log File, Not a Lesson: The Prescriptive Memory Problem

The Repo-Clone-Drift Pattern: A Perfect Failure

Stack Traces vs. Exception Handlers

Why Embeddings Do Not Bridge the Gap

The Embedding Space Argument

The ALWAYS/BECAUSE Rewrite

Production Implementation: Memory Rewriting in agent-swarm

Measured Outcomes: Three Failure Patterns Converted

What Did Not Work: The Pitfalls We Hit

Orthogonality to Memory Poisoning

The Deeper Pattern: Logging vs. Learning

Implementation Checklist

FAQ

What's the difference between prescriptive and descriptive memory?

Why don't incident reports help agents avoid repeat mistakes?

How do I convert descriptive memories to prescriptive ones?

Isn't this just better prompting?

Does this replace memory poisoning fixes?

The Success Penalty: How Our Agent Swarm Got 70× Slower Over 6 Months

Is Grep All You Need? What a New Paper Taught Us About Agent Memory

Nobody Prompt-Injected Our Agents — They Escalated Their Own Privileges

Build your swarm tonight.

The Repo-Clone-Drift Pattern: A Perfect Failure

Stack Traces vs. Exception Handlers

Why Embeddings Do Not Bridge the Gap

The Embedding Space Argument

The ALWAYS/BECAUSE Rewrite

Production Implementation: Memory Rewriting in agent-swarm

Measured Outcomes: Three Failure Patterns Converted

What Did Not Work: The Pitfalls We Hit

Orthogonality to Memory Poisoning

The Deeper Pattern: Logging vs. Learning

Implementation Checklist

FAQ

What's the difference between prescriptive and descriptive memory?

Why don't incident reports help agents avoid repeat mistakes?

How do I convert descriptive memories to prescriptive ones?

Isn't this just better prompting?

Does this replace memory poisoning fixes?

Related field notes

The Success Penalty: How Our Agent Swarm Got 70× Slower Over 6 Months

Is Grep All You Need? What a New Paper Taught Us About Agent Memory

Nobody Prompt-Injected Our Agents — They Escalated Their Own Privileges

Build your swarm tonight.