December 19, 2024·13 min read

The success penalty: our swarm got 70× slower because it kept succeeding.

Every task your swarm completes makes the next session slightly slower to start until you stop treating memory like a log file and start treating it like a database.

agent memorySQLite performancedatabase indexingAI agentsagent-swarm

Aaaaand it's gone meme about agent startup performance disappearing as memory grows — Every completed task adds context. Without memory hygiene, it also adds latency.

Six months ago, our agent swarm was completing tasks in under a second. With 14,351 successful tasks in the log, the same workflow took 70 seconds to initialize. We had not added complexity. We had not changed the agents. We had simply succeeded too much.

This is the success penalty: the counterintuitive reality that in long-running agent systems, accumulated memory acts as a drag coefficient on performance. Every completed task, every logged interaction, every stored context window becomes a liability if your architecture treats memory like a write-only log rather than a queryable database.

Why does successful task completion slow down agents?

Agent swarms rely on memory to maintain context across sessions. When an agent wakes up, it reconstructs its working state by querying historical interactions. In our implementation, this meant executing a straightforward query: retrieve all memories for this agent, ordered by recency. Simple enough until the mathematics of scale intervened.

We measured the damage: 138 milliseconds per lookup across 14,351 tasks. With agents requiring 50-100 context retrievals per session initialization, startup times ballooned from under one second to over 70 seconds. The database was not failing. It was faithfully executing O(n) scans across an unindexed table that grew linearly with our success.

The performance degradation curve

Task count	Avg lookup time	Session init	Degradation
1,000	2ms	0.8s	1x baseline
5,000	15ms	4.2s	5.2x
14,351	138ms	71s	70x
50,000 projected	~450ms	~240s	300x

The root cause was architectural debt. We had designed our memory layer using SQLite defaults: perfectly reasonable for prototyping, catastrophic for production accumulation. The memories table had primary keys but no operational indexes. Queries performed full table scans, and every inserted task made the hot path a little worse.

What did not work

Before finding the solution, we pursued several dead ends that taught us about the specific nature of our bottleneck.

Vertical scaling failed immediately. We upgraded from standard SSDs to NVMe drives and doubled RAM. Startup times improved by 8%. The bottleneck was not hardware throughput. It was query path complexity. iostat revealed 90% I/O wait on random reads, indicating the database was seeking across the disk rather than streaming from it.

Single-column indexing provided false hope. We added an index on agent_id, which reduced lookup time to 45ms. But the query planner still required an explicit sort for our ORDER BY timestamp DESC clause. The improvement came from filtering, but the sort operation remained O(n log n) on the filtered set.

Redis caching introduced memory pressure. We attempted to cache recent memories in Redis, hoping to bypass SQLite for hot data. This worked for individual agents but failed for swarm coordination. Redis consumed 12GB within a week storing redundant context, and cache invalidation occasionally served stale agent priorities.

How do you fix a 70× slowdown without ditching SQLite?

The solution required treating agent memory as a production database rather than an append-only log. We implemented a three-tier strategy: aggressive indexing, query restructuring, and active memory hygiene.

1. Composite indexes for access patterns

Our queries always filtered by agent_id and sorted by timestamp. A composite index covering both columns eliminates the sort operation and enables index-only scans:

-- Before: table scan + sort (138ms)
-- After: index-only scan (2ms)

CREATE INDEX idx_memories_agent_time_type
ON memories(agent_id, timestamp DESC, memory_type)
WHERE archived = 0;

-- Partial index keeps hot data fast by excluding cold
-- historical data from the active index entirely.

The partial index clause is critical. By excluding archived memories from the active index, we ensure that index depth remains constant regardless of total historical volume. Only hot memories from active sessions are indexed for fast retrieval.

2. Query surgery: SELECT * is the enemy

Our original query retrieved full memory objects, including large text content that was not needed for the initial context build:

-- Anti-pattern: retrieving everything
SELECT * FROM memories
WHERE agent_id = ?
ORDER BY timestamp DESC
LIMIT 100;

-- Optimized: targeted retrieval with key-set pagination
SELECT id, memory_type, created_at, content_preview
FROM memories
WHERE agent_id = ?
  AND timestamp < ?
  AND memory_type IN ('goal', 'observation', 'decision')
ORDER BY timestamp DESC
LIMIT 50;

Key-set pagination, using timestamp < ? instead of OFFSET, is essential. OFFSET requires the database to count and discard rows, making it O(offset). With key-set pagination, lookups stay consistent even when retrieving the 10,000th memory in a sequence.

3. WAL mode for concurrent microservices

Agent swarms write memories constantly while reading context simultaneously. SQLite's default rollback journal mode forces readers to wait for writers, creating head-of-line blocking during high-throughput periods:

const db = new Database("swarm.db", {
  pragmas: [
    "journal_mode = WAL",
    "synchronous = NORMAL",
    "cache_size = -64000",
    "temp_store = memory",
    "mmap_size = 268435456"
  ]
});

WAL mode separates reads and writes into different files, allowing our orchestrator to query agent state while agents are actively logging new observations. This eliminated the sporadic 2-3 second stalls we observed during swarm coordination events.

4. Active memory hygiene

Technical optimization buys time, but architecture determines the ceiling. We implemented a memory tiering system that automatically summarizes and archives old interactions:

-- Automated maintenance routine
INSERT INTO memory_summaries (agent_id, period_start, period_end, summary)
SELECT
  agent_id,
  MIN(timestamp) as period_start,
  MAX(timestamp) as period_end,
  ai_summarize_group(content) as summary
FROM memories
WHERE timestamp < date('now', '-7 days')
  AND archived = 0
GROUP BY agent_id, strftime('%Y-%W', timestamp);

INSERT INTO archived_memories SELECT * FROM memories
WHERE timestamp < date('now', '-7 days');

DELETE FROM memories
WHERE timestamp < date('now', '-7 days');

VACUUM;

This is the crucial architectural insight: production agent swarms need active memory hygiene, not just memory accumulation. Biological agents forget; artificial agents should summarize. By compressing week-old interactions into semantic summaries, we retain the learning while dropping the lookup cost.

The results: from 70× degradation to baseline

After implementing these changes, we measured the new performance characteristics:

Performance recovery

Metric	Before	After	Improvement
Single lookup	138ms	1.9ms	72x
Session initialization	71s	0.95s	74x
Concurrent agent limit	12	150+	12x
Database size, hot	4.2GB	180MB	23x

Most importantly, as our task count approaches 50,000, startup times remain pinned to the day-one baseline of under one second. The success penalty has been neutralized not by fighting growth, but by architecting for it.

The architectural stance: memory is not storage

If you are building agent systems, you need to take a position on memory architecture early. The default assumption that more data equals smarter agents is false. More data equals slower agents, and slower agents equal fewer tasks completed per watt, per dollar, per user patience threshold.

Treat agent memory as a cache with lifecycle rules. Recent memories need millisecond access. Old memories need semantic compression. Ancient memories need archiving or deletion. If your agents cannot start because they are parsing every interaction since deployment day, you have not built artificial intelligence. You have built artificial bureaucracy.

SQLite is sufficient for production swarms serving tens of thousands of tasks. PostgreSQL is not a magic bullet. If your query patterns are O(n), a bigger database just delays the inevitable. The fix is indexing discipline, query hygiene, and accepting that forgetting is a feature, not a bug.

Implementation checklist

Analyze EXPLAIN QUERY PLAN for every memory retrieval in your hot path.
Create composite indexes matching your exact filter and sort patterns.
Enable WAL mode if you have concurrent readers and writers.
Implement time-based partitioning or archiving for hot and cold separation.
Replace SELECT * with column-specific retrieval.
Use key-set pagination, never OFFSET for large datasets.
Schedule weekly VACUUM operations to prevent fragmentation.

The success penalty is avoidable, but only if you recognize that completion and accumulation are orthogonal. Optimize for the former; architect against the latter. Your future self, watching that sub-second startup at task 100,000, will thank you.

FAQ

How do I detect this performance penalty in my agent swarm?

Profile session startup times. If initialization duration correlates with total historical task count rather than concurrent load, you have accumulated memory debt. Use SQLite EXPLAIN QUERY PLAN to identify table scans on memory retrieval queries.

Should I switch to PostgreSQL instead of optimizing SQLite?

Not necessarily. SQLite handles 50,000+ tasks with proper indexing and has lower operational overhead. Only consider migrating if you need horizontal write scaling across multiple nodes or complex multi-agent concurrent writes.

How frequently should I archive agent memories?

For high-frequency agents, archive weekly; for batch agents, monthly. The critical threshold is keeping active context tables under 10,000 rows per agent to maintain sub-5ms lookup performance on standard SSDs.

Will these optimizations affect real-time agent responsiveness?

These changes reduce p99 latency by preventing context rebuild stalls. Active memory stays hot for immediate access, while historical queries hit optimized archived indexes without blocking current task execution.

What about using vector databases for agent memory instead?

Use vector DBs for semantic similarity search, but keep structured metadata in SQLite. A hybrid approach works best: vector stores for fuzzy recall, SQLite for deterministic lookups and agent state management.

/ keep reading

All posts

June 10, 2026 / 12 min read

The success penalty: our swarm got 70× slower because it kept succeeding.

Why does successful task completion slow down agents?

The performance degradation curve

What did not work

How do you fix a 70× slowdown without ditching SQLite?

1. Composite indexes for access patterns

2. Query surgery: SELECT * is the enemy

3. WAL mode for concurrent microservices

4. Active memory hygiene

The results: from 70× degradation to baseline

Performance recovery

The architectural stance: memory is not storage

Implementation checklist

FAQ

How do I detect this performance penalty in my agent swarm?

Should I switch to PostgreSQL instead of optimizing SQLite?

How frequently should I archive agent memories?

Will these optimizations affect real-time agent responsiveness?

What about using vector databases for agent memory instead?

Is Grep All You Need? What a New Paper Taught Us About Agent Memory

Your Agent's Memory Is a Log File, Not a Lesson: The Prescriptive Memory Problem

Nobody Prompt-Injected Our Agents — They Escalated Their Own Privileges

Build your swarm tonight.

Why does successful task completion slow down agents?

The performance degradation curve

What did not work

How do you fix a 70× slowdown without ditching SQLite?

1. Composite indexes for access patterns

2. Query surgery: SELECT * is the enemy

3. WAL mode for concurrent microservices

4. Active memory hygiene

The results: from 70× degradation to baseline

Performance recovery

The architectural stance: memory is not storage

Implementation checklist

FAQ

How do I detect this performance penalty in my agent swarm?

Should I switch to PostgreSQL instead of optimizing SQLite?

How frequently should I archive agent memories?

Will these optimizations affect real-time agent responsiveness?

What about using vector databases for agent memory instead?

Related field notes

Is Grep All You Need? What a New Paper Taught Us About Agent Memory

Your Agent's Memory Is a Log File, Not a Lesson: The Prescriptive Memory Problem

Nobody Prompt-Injected Our Agents — They Escalated Their Own Privileges

Build your swarm tonight.