Run

Agent Snapshots

Immutable, versioned bundles of your agent code. Upload once, reference in any spec. The full snapshot SDK.

An agent snapshot is a versioned, content-addressed bundle of your agent's code. Upload it once with ks.agents.upload(), reference it in any spec with agent.type: snapshot, and Keystone will fetch and run that exact version inside the sandbox.

Why bother? Because the answer to "did v3 of my agent regress against v2?" requires both versions to exist as first-class entities. Snapshots give you that.

When to use snapshots

ScenarioUse snapshots?
Hello-world / prototypingNo — use agent.type: paragon or cli
Agent code lives in same repo as the specMaybe — cli works if your binary is built locally
Agent has its own deployment lifecycleYes
You need to compare v2 vs. v3Yes
You want every trace tagged with the agent that produced itYes
Agent has many runtime dependenciesYes (snapshots are tarballs — bundle deps inside)

agents.upload(opts)

Upload an agent snapshot. Version is auto-assigned by the server (incremented from the last upload of the same name).

import { readFileSync } from "fs";
 
const snap = await ks.agents.upload({
  name: "email-agent",                              // logical name
  entrypoint: ["python", "main.py"],                // exec form
  runtime: "python3.12",                            // optional hint
  tag: "v2.1",                                      // optional human label
  bundle: readFileSync("dist/email-agent.tar.gz"),  // Uint8Array of the tarball
});
// Returns:
// {
//   id: "snap_a1b2c3...",                          // immutable content hash
//   name: "email-agent",
//   version: 5,
//   tag: "v2.1",
//   digest: "sha256:abc123...",
//   size_bytes: 2_457_600,
//   storage_path: "/agents/email-agent/v5",
//   runtime: "python3.12",
//   entrypoint: ["python", "main.py"],
//   created_at: "2026-04-28T22:00:00Z",
// }

What this does: posts a multipart form to POST /v1/agents with metadata (JSON) + bundle (binary). The server validates the tarball, computes its sha256, writes it to storage, and creates a snapshot row with the next version number for that name.

The bundle is immutable — the digest is content-addressed. If you upload the exact same bytes twice, you get the same digest but a new version row.

Tarball format

Standard .tar.gz rooted at the agent's working directory. Example layout for a Python agent:

email-agent.tar.gz/
├── main.py             # entrypoint references this
├── requirements.txt
├── lib/
│   └── helpers.py
└── prompts/
    └── system.txt

When the snapshot runs, the server extracts to /agent inside the sandbox and runs the entrypoint command from there. Anything in the tarball is available as a relative path.

For a Node agent:

email-agent.tar.gz/
├── package.json
├── package-lock.json
├── dist/
│   └── main.js
└── node_modules/       # optional but recommended — avoids npm install at runtime

For a Docker-image-style agent, see agent.type: image in the Spec Reference — that pulls from a registry instead of unpacking a tarball.

Resolving a snapshot

agents.get(name, opts?)

const latest = await ks.agents.get("email-agent");                       // latest version
const tagged = await ks.agents.get("email-agent", { tag: "v2.1" });      // by tag
const v3      = await ks.agents.get("email-agent", { version: 3 });       // pin a version

GET /v1/agents/<name>/latest (or /tags/<tag> or /versions/<n>).

agents.getById(id)

const exact = await ks.agents.getById("snap_abc123...");

GET /v1/snapshots/<id> — fetch the immutable record by its content-addressed ID.

agents.list(opts?) / agents.listVersions(name, opts?)

// Every snapshot
const page = await ks.agents.list({ limit: 50 });
// { items: AgentSnapshot[], next_cursor?: string }
 
// Every version of one agent
const versions = await ks.agents.listVersions("email-agent");

GET /v1/agents and /v1/agents/<name>/versions — paginated. Pass cursor: page.next_cursor to fetch subsequent pages.

agents.delete(snapshot)

const snap = await ks.agents.get("email-agent", { version: 1 });
await ks.agents.delete(snap);

Pass the full snapshot object (TS/Python) or pointer (Go) — not just the ID. DELETE /v1/snapshots/<id>. Storage is freed; the version row is removed.

Referencing a snapshot in a spec

agent:
  type: snapshot
  snapshot: email-agent          # latest version
  timeout: 5m

Or pin a specific version:

agent:
  type: snapshot
  snapshot_id: snap_abc123       # exact content hash
  timeout: 5m

snapshot: is the friendly form (resolves to latest); snapshot_id: pins an exact version. Use the latter when you want full reproducibility — even if a teammate uploads a new version, your spec keeps using the pinned digest.

Override the entrypoint

agent:
  type: snapshot
  snapshot: email-agent
  entrypoint: ["python", "main.py", "--mode=eval"]   # override the bundled entrypoint

Useful when one snapshot has multiple modes.

Tags

Tags are human-readable labels. Common patterns:

  • latest — set automatically by Keystone after every upload.
  • stable — manually applied when an agent passes regression tests.
  • v2.1, v2.0 — semver labels for major releases.

Specs can reference tags:

agent:
  type: snapshot
  snapshot: email-agent
  # there's no `tag:` field — `snapshot:` always resolves to latest. Pin via snapshot_id.

If you need tag-driven resolution, fetch programmatically before creating the experiment:

const stable = await ks.agents.get("email-agent", { tag: "stable" });
const exp = await ks.experiments.create({ name: "...", spec_id: "..." });
// Override agent.snapshot_id at create time via spec mutation, or use a templating layer.

Querying agent traces

Every trace event the agent emits is tagged with the snapshot that produced it. Query by name:

GET /v1/agents/email-agent/traces
GET /v1/agents/email-agent/traces?version=3
GET /v1/agents/email-agent/traces?limit=100

Returns the trace events plus computed metrics (tool success rate, latency p50/p95, per-tool breakdown).

This is the "compare two versions" workflow:

const v2Traces = await fetch(`${baseUrl}/v1/agents/email-agent/traces?version=2`);
const v3Traces = await fetch(`${baseUrl}/v1/agents/email-agent/traces?version=3`);
// Compare metrics, latency, tool calls.

AgentAuth — declaring what the agent needs

Snapshots can declare what they require at runtime:

const snap = await ks.agents.upload({
  name: "email-agent",
  entrypoint: ["python", "main.py"],
  runtime: "python3.12",
  bundle: tarballBytes,
  auth: {
    required_env: ["ANTHROPIC_API_KEY", "STRIPE_KEY"],
    config_files: [
      { path: ".env", template: "ANTHROPIC_API_KEY={{secrets.ANTHROPIC_API_KEY}}\nSTRIPE_KEY={{secrets.STRIPE_KEY}}" },
    ],
    egress: {
      "api.anthropic.com": ["443"],
    },
  },
});
FieldMeaning
required_envEnv vars the agent needs. Sandbox boot fails if any are missing.
config_filesFiles to template into the sandbox at boot, with secret substitution.
egressHosts the agent expects to reach — auto-added to network.egress.allow if compatible with the spec's network policy.

auth is enforced at sandbox boot. A spec that creates a sandbox from a snapshot whose required_env is unmet fails before the agent runs.

Patterns

Bundle workflow

For Python:

cd email-agent/
tar -czf dist/email-agent.tar.gz \
    --exclude='__pycache__' \
    --exclude='*.pyc' \
    --exclude='.git' \
    --exclude='dist' \
    .

For Node:

cd email-agent/
npm ci --omit=dev
tar -czf dist/email-agent.tar.gz \
    --exclude='*.log' \
    --exclude='.git' \
    --exclude='dist' \
    .

For Go:

cd email-agent/
go build -o agent ./cmd/agent
tar -czf dist/email-agent.tar.gz agent
# entrypoint: ["./agent"]

CI: upload on tag

# .github/workflows/release.yml
- name: Build and upload snapshot
  run: |
    tar -czf agent.tar.gz .
    npx ks agents upload email-agent agent.tar.gz \
      --entrypoint "node dist/main.js" \
      --tag $GITHUB_REF_NAME

After every release tag, a fresh snapshot is uploaded. Specs that use snapshot: email-agent automatically pick up the latest.

Per-PR ephemeral snapshots

# .github/workflows/pr.yml
- name: Build and run regression eval
  run: |
    SNAP_NAME="email-agent-pr-${{github.event.number}}"
    tar -czf agent.tar.gz .
    npx ks agents upload $SNAP_NAME agent.tar.gz --entrypoint "node dist/main.js"
    
    cat > spec.yaml <<EOF
    version: 1
    id: pr-eval
    base: ubuntu:24.04
    agent: { type: snapshot, snapshot: $SNAP_NAME }
    invariants: { ... }
    EOF
    
    npx ks eval run spec.yaml

Spin up a snapshot per PR, run regression evals, throw it away on merge or close.

Pinning for reproducibility

# specs/regression-v2.yaml
agent:
  type: snapshot
  snapshot_id: snap_abc123def456...      # exact bytes — never changes

Useful for archival regression specs that should produce identical results six months from now.

Storage limits

LimitDefaultConfigurable on self-hosted?
Snapshot size100 MBYes (agents.max_size)
Versions per agent100Yes (agents.max_versions)
Storage retentionindefiniteYes (TTL via retention.snapshots)

Old versions can be deleted with agents.delete() — you can't undelete, but you can re-upload the same bytes (it gets the same digest, new version row).