June 4, 2026·9 min read

Script Workflows: durable one-off runs for agent work.

A workflow's power for one ad-hoc job: launch a TypeScript run, journal every step, replay instead of restarting, and compose the reusable swarm scripts every agent gets by default.

Script Workflowsdurable replayswarm scriptsworkflow journalAI agents

Durable Script Workflows journal and reusable swarm script catalog — Launch once. Journal every step. Replay without duplicating work.

Anthropic's Claude Code post on dynamic workflows is a good idea: agents should be able to run patterns like classify-and-act, fan-out-and-synthesize, adversarial verification, tournament, and loop-until-done without turning every operation into a bespoke product.

That is the right direction. Our take is more operational: if the run is doing real agent work, it needs more than a clever prompt pattern. It needs a durable journal, replay semantics, inspection, and reusable building blocks that every worker gets by default.

Script Workflows started from that product itch inside Agent Swarm. Taras pointed at “one-off workflow executions in tasks” and asked what the swarm version should look like. We reused the machinery we already trust: scripts, workflow-style steps, task spawning, and the database journal.

Ok, Anthropic: dynamic workflows are useful. In Agent Swarm, one-off runs also resume from a durable replay journal and compose the swarm scripts catalog every agent already has.

The result is a one-off TypeScript run that can be launched from an MCP tool, inspected from the dashboard, and resumed from its journal instead of restarting from scratch. The pattern is not just “ask the agent to follow a workflow.” It is launch the run, journal each step, replay completed work, and call known scripts like task-context-gathering, smart-recall, and compound-insights when the job needs reusable swarm operations.

The loop is launch, journal, inspect

The user-facing path is deliberately boring. Launch with launch-script-run. Let the supervisor run it in the background. Inspect with get-script-run, list-script-runs, or the Script Runs dashboard.

The source is TypeScript. This is close to how our agents actually use the system when they need task context without hand-rolling the same memory and task-detail fan-out again:

export default async function main(args, ctx) {
  const context = await ctx.step.swarmScript("gather-task-context", {
    name: "task-context-gathering",
    scope: "global",
    args: {
      taskId: args.taskId,
      queries: [
        "script workflows durable one-off runs",
        "journal replay label lint",
        "swarm scripts catalog",
      ],
    },
  });

  const brief = await ctx.step.rawLlm("operator-brief", {
    prompt: `Turn this task context into an operator brief:\n${JSON.stringify(context)}`,
  });

  return { context, brief };
}

That example matters because it removes a common tax from agent work. Agents do not need to remember the exact tool sequence for task details, multi-query memory recall, deduping, and summarization glue. They call a catalog script, journal the result, and keep moving.

Durability is the feature

Agent work is interruption-prone by default. A useful run can span minutes, dozens of tool calls, multiple child tasks, and a human waiting somewhere in Slack. Containers restart. Providers rate-limit. A reviewer asks for a tweak. A process dies after step 7 of 12.

Without durable steps, retry means “do the whole thing again and hope nothing duplicated.” That is where agent systems get expensive and weird: duplicate child tasks, repeated outbound messages, repeated LLM calls, and inconsistent partial state.

Script Workflows use the step label as a durability key. Before a step runs, the harness asks the API whether (runId, label) already exists in script_run_journal. If it exists, the harness returns the stored result. If it does not, the step executes and writes the result or error.

Live QA confirmed the important behavior: two calls with the same durable label replayed the first result and left one journal row, not two. Retry reads history instead of duplicating the world.

That sounds small until you apply it to a long swarm job. A run can safely compose “fetch the context,” “rank the inputs,” “spawn five reviewers,” “merge the results,” and “write the final report” because each step has a persisted boundary. Restarting the run does not mean restarting the work.

Swarm scripts become the default building block

The other half is the script catalog. Agent Swarm already had reusable TypeScript scripts callable through script tools and from workflow swarm-script nodes. Script Workflows make that catalog the default building block for durable one-off jobs.

task-context-gathering gets task details plus deduped memory recall in one call.
smart-recall runs multi-query memory search and reranks the results.
compound-insights builds an all-in-one swarm operations snapshot across tasks, failures, schedules, tool usage, memory health, and per-agent activity.

This is where the feature gets practical. Agents stop re-implementing the same multi-tool chains. They do not need to hand-roll HTTP, paste the same aggregation code into another prompt, or rebuild memory recall from primitives. They call the script.

const snapshot = await ctx.step.swarmScript("ops-snapshot", {
  name: "compound-insights",
  scope: "global",
  args: { hours: 24 },
});

const risks = await ctx.step.rawLlm("risk-readout", {
  prompt: `Extract the top operational risks:\n${JSON.stringify(snapshot)}`,
});

The catalog script does deterministic data work. The workflow journal makes the call resumable. The LLM step is also journaled. If the run dies after the snapshot, the expensive data pull does not repeat. If it dies after the readout, the model call does not repeat.

Guardrails keep one-off runs honest

Durability needs boundaries. Script Workflows v1 ships with the boring guardrails that matter in production: label lint rejects obvious literal step labels inside loops, SCRIPT_RUN_MAX_AGENT_TASKS caps agent-task fan-out, step and wall-clock caps keep runs from becoming invisible forever-jobs, and terminal runs stay terminal.

The terminal states are intentionally plain: completed, failed, cancelled, and aborted_limit. A label lint violation is a launch rejection, not a persisted run that pretends to have started.

What this unlocks

For operators, this means fewer “please run these six tools and summarize the result” prompts. You can ask for the job, launch one durable run, and inspect the journal.

For agents, it means less glue. The right move becomes: call a seed script, journal it, use the output, and delegate only when a real agent task is needed.

For the swarm, it is compounding infrastructure. Every useful ad-hoc chain can start as a one-off Script Workflow. If it proves valuable, promote the pieces into catalog scripts or a registered workflow. If it was truly one-time, the run still leaves a clean audit trail.

That is the shape we want: agents doing serious work without turning every serious job into a permanent DAG on day one.

/ keep reading

All posts

June 7, 2026 / 11 min read

Script Workflows: durable one-off runs for agent work.

The loop is launch, journal, inspect

Durability is the feature

Swarm scripts become the default building block

Guardrails keep one-off runs honest

What this unlocks

Right-sizing Your Agent Swarm: What Container CPU and RAM Graphs Are Really Telling You

The Architecture Behind Task Delegation: Pools, Routing, and Dependencies

Agent Swarm by the Numbers: 80 Days, 242 PRs, 6 Agents

Build your swarm tonight.

The loop is launch, journal, inspect

Durability is the feature

Swarm scripts become the default building block

Guardrails keep one-off runs honest

What this unlocks

Related field notes

Right-sizing Your Agent Swarm: What Container CPU and RAM Graphs Are Really Telling You

The Architecture Behind Task Delegation: Pools, Routing, and Dependencies

Agent Swarm by the Numbers: 80 Days, 242 PRs, 6 Agents

Build your swarm tonight.