Agent Snapshots

Immutable, versioned bundles of your agent code. Upload once, reference in any spec. The full snapshot SDK.

An agent snapshot is a versioned, content-addressed bundle of your agent's code. Upload it once with ks.agents.upload(), reference it in any spec with agent.type: snapshot, and Keystone will fetch and run that exact version inside the sandbox.

Why bother? Because the answer to "did v3 of my agent regress against v2?" requires both versions to exist as first-class entities. Snapshots give you that.

When to use snapshots

Scenario	Use snapshots?
Hello-world / prototyping	No — use `agent.type: paragon` or `cli`
Agent code lives in same repo as the spec	Maybe — `cli` works if your binary is built locally
Agent has its own deployment lifecycle	Yes
You need to compare v2 vs. v3	Yes
You want every trace tagged with the agent that produced it	Yes
Agent has many runtime dependencies	Yes (snapshots are tarballs — bundle deps inside)

`agents.upload(opts)`

Upload an agent snapshot. Version is auto-assigned by the server (incremented from the last upload of the same name).

import { readFileSync } from "fs";
 
const snap = await ks.agents.upload({
  name: "email-agent",                              // logical name
  entrypoint: ["python", "main.py"],                // exec form
  runtime: "python3.12",                            // optional hint
  tag: "v2.1",                                      // optional human label
  bundle: readFileSync("dist/email-agent.tar.gz"),  // Uint8Array of the tarball
});
// Returns:
// {
//   id: "snap_a1b2c3...",                          // immutable content hash
//   name: "email-agent",
//   version: 5,
//   tag: "v2.1",
//   digest: "sha256:abc123...",
//   size_bytes: 2_457_600,
//   storage_path: "/agents/email-agent/v5",
//   runtime: "python3.12",
//   entrypoint: ["python", "main.py"],
//   created_at: "2026-04-28T22:00:00Z",
// }

What this does: posts a multipart form to POST /v1/agents with metadata (JSON) + bundle (binary). The server validates the tarball, computes its sha256, writes it to storage, and creates a snapshot row with the next version number for that name.

The bundle is immutable — the digest is content-addressed. If you upload the exact same bytes twice, you get the same digest but a new version row.

Tarball format

Standard .tar.gz rooted at the agent's working directory. Example layout for a Python agent:

email-agent.tar.gz/
├── main.py             # entrypoint references this
├── requirements.txt
├── lib/
│   └── helpers.py
└── prompts/
    └── system.txt

When the snapshot runs, the server extracts to /agent inside the sandbox and runs the entrypoint command from there. Anything in the tarball is available as a relative path.

For a Node agent:

email-agent.tar.gz/
├── package.json
├── package-lock.json
├── dist/
│   └── main.js
└── node_modules/       # optional but recommended — avoids npm install at runtime

For a Docker-image-style agent, see agent.type: image in the Spec Reference — that pulls from a registry instead of unpacking a tarball.

Resolving a snapshot

`agents.get(name, opts?)`

const latest = await ks.agents.get("email-agent");                       // latest version
const tagged = await ks.agents.get("email-agent", { tag: "v2.1" });      // by tag
const v3      = await ks.agents.get("email-agent", { version: 3 });       // pin a version

GET /v1/agents/<name>/latest (or /tags/<tag> or /versions/<n>).

`agents.getById(id)`

const exact = await ks.agents.getById("snap_abc123...");

GET /v1/snapshots/<id> — fetch the immutable record by its content-addressed ID.

`agents.list(opts?)` / `agents.listVersions(name, opts?)`

// Every snapshot
const page = await ks.agents.list({ limit: 50 });
// { items: AgentSnapshot[], next_cursor?: string }
 
// Every version of one agent
const versions = await ks.agents.listVersions("email-agent");

GET /v1/agents and /v1/agents/<name>/versions — paginated. Pass cursor: page.next_cursor to fetch subsequent pages.

`agents.delete(snapshot)`

const snap = await ks.agents.get("email-agent", { version: 1 });
await ks.agents.delete(snap);

Pass the full snapshot object (TS/Python) or pointer (Go) — not just the ID. DELETE /v1/snapshots/<id>. Storage is freed; the version row is removed.

Referencing a snapshot in a spec

agent:
  type: snapshot
  snapshot: email-agent          # latest version
  timeout: 5m

Or pin a specific version:

agent:
  type: snapshot
  snapshot_id: snap_abc123       # exact content hash
  timeout: 5m

snapshot: is the friendly form (resolves to latest); snapshot_id: pins an exact version. Use the latter when you want full reproducibility — even if a teammate uploads a new version, your spec keeps using the pinned digest.

Override the entrypoint

agent:
  type: snapshot
  snapshot: email-agent
  entrypoint: ["python", "main.py", "--mode=eval"]   # override the bundled entrypoint

Useful when one snapshot has multiple modes.

Querying agent traces

Every trace event the agent emits is tagged with the snapshot that produced it. Query by name:

GET /v1/agents/email-agent/traces
GET /v1/agents/email-agent/traces?version=3
GET /v1/agents/email-agent/traces?limit=100

Returns the trace events plus computed metrics (tool success rate, latency p50/p95, per-tool breakdown).

This is the "compare two versions" workflow:

const v2Traces = await fetch(`${baseUrl}/v1/agents/email-agent/traces?version=2`);
const v3Traces = await fetch(`${baseUrl}/v1/agents/email-agent/traces?version=3`);
// Compare metrics, latency, tool calls.

AgentAuth — declaring what the agent needs

Snapshots can declare what they require at runtime:

const snap = await ks.agents.upload({
  name: "email-agent",
  entrypoint: ["python", "main.py"],
  runtime: "python3.12",
  bundle: tarballBytes,
  auth: {
    required_env: ["ANTHROPIC_API_KEY", "STRIPE_KEY"],
    config_files: [
      { path: ".env", template: "ANTHROPIC_API_KEY={{secrets.ANTHROPIC_API_KEY}}\nSTRIPE_KEY={{secrets.STRIPE_KEY}}" },
    ],
    egress: {
      "api.anthropic.com": ["443"],
    },
  },
});

Field	Meaning
`required_env`	Env vars the agent needs. Sandbox boot fails if any are missing.
`config_files`	Files to template into the sandbox at boot, with secret substitution.
`egress`	Hosts the agent expects to reach — auto-added to `network.egress.allow` if compatible with the spec's network policy.

auth is enforced at sandbox boot. A spec that creates a sandbox from a snapshot whose required_env is unmet fails before the agent runs.

Patterns

Bundle workflow

For Python:

cd email-agent/
tar -czf dist/email-agent.tar.gz \
    --exclude='__pycache__' \
    --exclude='*.pyc' \
    --exclude='.git' \
    --exclude='dist' \
    .

For Node:

cd email-agent/
npm ci --omit=dev
tar -czf dist/email-agent.tar.gz \
    --exclude='*.log' \
    --exclude='.git' \
    --exclude='dist' \
    .

For Go:

cd email-agent/
go build -o agent ./cmd/agent
tar -czf dist/email-agent.tar.gz agent
# entrypoint: ["./agent"]

CI: upload on tag

# .github/workflows/release.yml
- name: Build and upload snapshot
  run: |
    tar -czf agent.tar.gz .
    npx ks agents upload email-agent agent.tar.gz \
      --entrypoint "node dist/main.js" \
      --tag $GITHUB_REF_NAME

After every release tag, a fresh snapshot is uploaded. Specs that use snapshot: email-agent automatically pick up the latest.

Per-PR ephemeral snapshots

# .github/workflows/pr.yml
- name: Build and run regression eval
  run: |
    SNAP_NAME="email-agent-pr-${{github.event.number}}"
    tar -czf agent.tar.gz .
    npx ks agents upload $SNAP_NAME agent.tar.gz --entrypoint "node dist/main.js"
    
    cat > spec.yaml <<EOF
    version: 1
    id: pr-eval
    base: ubuntu:24.04
    agent: { type: snapshot, snapshot: $SNAP_NAME }
    invariants: { ... }
    EOF
    
    npx ks eval run spec.yaml

Spin up a snapshot per PR, run regression evals, throw it away on merge or close.

Pinning for reproducibility

# specs/regression-v2.yaml
agent:
  type: snapshot
  snapshot_id: snap_abc123def456...      # exact bytes — never changes

Useful for archival regression specs that should produce identical results six months from now.

Storage limits

Limit	Default	Configurable on self-hosted?
Snapshot size	100 MB	Yes (`agents.max_size`)
Versions per agent	100	Yes (`agents.max_versions`)
Storage retention	indefinite	Yes (TTL via `retention.snapshots`)

Old versions can be deleted with agents.delete() — you can't undelete, but you can re-upload the same bytes (it gets the same digest, new version row).

Experiments Datasets