Agent Snapshots
Immutable, versioned bundles of your agent code. Upload once, reference in any spec. The full snapshot SDK.
An agent snapshot is a versioned, content-addressed bundle of your agent's code. Upload it once with ks.agents.upload(), reference it in any spec with agent.type: snapshot, and Keystone will fetch and run that exact version inside the sandbox.
Why bother? Because the answer to "did v3 of my agent regress against v2?" requires both versions to exist as first-class entities. Snapshots give you that.
When to use snapshots
| Scenario | Use snapshots? |
|---|---|
| Hello-world / prototyping | No — use agent.type: paragon or cli |
| Agent code lives in same repo as the spec | Maybe — cli works if your binary is built locally |
| Agent has its own deployment lifecycle | Yes |
| You need to compare v2 vs. v3 | Yes |
| You want every trace tagged with the agent that produced it | Yes |
| Agent has many runtime dependencies | Yes (snapshots are tarballs — bundle deps inside) |
agents.upload(opts)
Upload an agent snapshot. Version is auto-assigned by the server (incremented from the last upload of the same name).
import { readFileSync } from "fs";
const snap = await ks.agents.upload({
name: "email-agent", // logical name
entrypoint: ["python", "main.py"], // exec form
runtime: "python3.12", // optional hint
tag: "v2.1", // optional human label
bundle: readFileSync("dist/email-agent.tar.gz"), // Uint8Array of the tarball
});
// Returns:
// {
// id: "snap_a1b2c3...", // immutable content hash
// name: "email-agent",
// version: 5,
// tag: "v2.1",
// digest: "sha256:abc123...",
// size_bytes: 2_457_600,
// storage_path: "/agents/email-agent/v5",
// runtime: "python3.12",
// entrypoint: ["python", "main.py"],
// created_at: "2026-04-28T22:00:00Z",
// }What this does: posts a multipart form to POST /v1/agents with metadata (JSON) + bundle (binary). The server validates the tarball, computes its sha256, writes it to storage, and creates a snapshot row with the next version number for that name.
The bundle is immutable — the digest is content-addressed. If you upload the exact same bytes twice, you get the same digest but a new version row.
Tarball format
Standard .tar.gz rooted at the agent's working directory. Example layout for a Python agent:
email-agent.tar.gz/
├── main.py # entrypoint references this
├── requirements.txt
├── lib/
│ └── helpers.py
└── prompts/
└── system.txt
When the snapshot runs, the server extracts to /agent inside the sandbox and runs the entrypoint command from there. Anything in the tarball is available as a relative path.
For a Node agent:
email-agent.tar.gz/
├── package.json
├── package-lock.json
├── dist/
│ └── main.js
└── node_modules/ # optional but recommended — avoids npm install at runtime
For a Docker-image-style agent, see agent.type: image in the Spec Reference — that pulls from a registry instead of unpacking a tarball.
Resolving a snapshot
agents.get(name, opts?)
const latest = await ks.agents.get("email-agent"); // latest version
const tagged = await ks.agents.get("email-agent", { tag: "v2.1" }); // by tag
const v3 = await ks.agents.get("email-agent", { version: 3 }); // pin a versionGET /v1/agents/<name>/latest (or /tags/<tag> or /versions/<n>).
agents.getById(id)
const exact = await ks.agents.getById("snap_abc123...");GET /v1/snapshots/<id> — fetch the immutable record by its content-addressed ID.
agents.list(opts?) / agents.listVersions(name, opts?)
// Every snapshot
const page = await ks.agents.list({ limit: 50 });
// { items: AgentSnapshot[], next_cursor?: string }
// Every version of one agent
const versions = await ks.agents.listVersions("email-agent");GET /v1/agents and /v1/agents/<name>/versions — paginated. Pass cursor: page.next_cursor to fetch subsequent pages.
agents.delete(snapshot)
const snap = await ks.agents.get("email-agent", { version: 1 });
await ks.agents.delete(snap);Pass the full snapshot object (TS/Python) or pointer (Go) — not just the ID. DELETE /v1/snapshots/<id>. Storage is freed; the version row is removed.
Referencing a snapshot in a spec
agent:
type: snapshot
snapshot: email-agent # latest version
timeout: 5mOr pin a specific version:
agent:
type: snapshot
snapshot_id: snap_abc123 # exact content hash
timeout: 5msnapshot: is the friendly form (resolves to latest); snapshot_id: pins an exact version. Use the latter when you want full reproducibility — even if a teammate uploads a new version, your spec keeps using the pinned digest.
Override the entrypoint
agent:
type: snapshot
snapshot: email-agent
entrypoint: ["python", "main.py", "--mode=eval"] # override the bundled entrypointUseful when one snapshot has multiple modes.
Tags
Tags are human-readable labels. Common patterns:
latest— set automatically by Keystone after every upload.stable— manually applied when an agent passes regression tests.v2.1,v2.0— semver labels for major releases.
Specs can reference tags:
agent:
type: snapshot
snapshot: email-agent
# there's no `tag:` field — `snapshot:` always resolves to latest. Pin via snapshot_id.If you need tag-driven resolution, fetch programmatically before creating the experiment:
const stable = await ks.agents.get("email-agent", { tag: "stable" });
const exp = await ks.experiments.create({ name: "...", spec_id: "..." });
// Override agent.snapshot_id at create time via spec mutation, or use a templating layer.Querying agent traces
Every trace event the agent emits is tagged with the snapshot that produced it. Query by name:
GET /v1/agents/email-agent/traces
GET /v1/agents/email-agent/traces?version=3
GET /v1/agents/email-agent/traces?limit=100
Returns the trace events plus computed metrics (tool success rate, latency p50/p95, per-tool breakdown).
This is the "compare two versions" workflow:
const v2Traces = await fetch(`${baseUrl}/v1/agents/email-agent/traces?version=2`);
const v3Traces = await fetch(`${baseUrl}/v1/agents/email-agent/traces?version=3`);
// Compare metrics, latency, tool calls.AgentAuth — declaring what the agent needs
Snapshots can declare what they require at runtime:
const snap = await ks.agents.upload({
name: "email-agent",
entrypoint: ["python", "main.py"],
runtime: "python3.12",
bundle: tarballBytes,
auth: {
required_env: ["ANTHROPIC_API_KEY", "STRIPE_KEY"],
config_files: [
{ path: ".env", template: "ANTHROPIC_API_KEY={{secrets.ANTHROPIC_API_KEY}}\nSTRIPE_KEY={{secrets.STRIPE_KEY}}" },
],
egress: {
"api.anthropic.com": ["443"],
},
},
});| Field | Meaning |
|---|---|
required_env | Env vars the agent needs. Sandbox boot fails if any are missing. |
config_files | Files to template into the sandbox at boot, with secret substitution. |
egress | Hosts the agent expects to reach — auto-added to network.egress.allow if compatible with the spec's network policy. |
auth is enforced at sandbox boot. A spec that creates a sandbox from a snapshot whose required_env is unmet fails before the agent runs.
Patterns
Bundle workflow
For Python:
cd email-agent/
tar -czf dist/email-agent.tar.gz \
--exclude='__pycache__' \
--exclude='*.pyc' \
--exclude='.git' \
--exclude='dist' \
.For Node:
cd email-agent/
npm ci --omit=dev
tar -czf dist/email-agent.tar.gz \
--exclude='*.log' \
--exclude='.git' \
--exclude='dist' \
.For Go:
cd email-agent/
go build -o agent ./cmd/agent
tar -czf dist/email-agent.tar.gz agent
# entrypoint: ["./agent"]CI: upload on tag
# .github/workflows/release.yml
- name: Build and upload snapshot
run: |
tar -czf agent.tar.gz .
npx ks agents upload email-agent agent.tar.gz \
--entrypoint "node dist/main.js" \
--tag $GITHUB_REF_NAMEAfter every release tag, a fresh snapshot is uploaded. Specs that use snapshot: email-agent automatically pick up the latest.
Per-PR ephemeral snapshots
# .github/workflows/pr.yml
- name: Build and run regression eval
run: |
SNAP_NAME="email-agent-pr-${{github.event.number}}"
tar -czf agent.tar.gz .
npx ks agents upload $SNAP_NAME agent.tar.gz --entrypoint "node dist/main.js"
cat > spec.yaml <<EOF
version: 1
id: pr-eval
base: ubuntu:24.04
agent: { type: snapshot, snapshot: $SNAP_NAME }
invariants: { ... }
EOF
npx ks eval run spec.yamlSpin up a snapshot per PR, run regression evals, throw it away on merge or close.
Pinning for reproducibility
# specs/regression-v2.yaml
agent:
type: snapshot
snapshot_id: snap_abc123def456... # exact bytes — never changesUseful for archival regression specs that should produce identical results six months from now.
Storage limits
| Limit | Default | Configurable on self-hosted? |
|---|---|---|
| Snapshot size | 100 MB | Yes (agents.max_size) |
| Versions per agent | 100 | Yes (agents.max_versions) |
| Storage retention | indefinite | Yes (TTL via retention.snapshots) |
Old versions can be deleted with agents.delete() — you can't undelete, but you can re-upload the same bytes (it gets the same digest, new version row).