Spec

Sandboxes

Create, interact with, snapshot, and destroy isolated sandbox environments via the Keystone SDK.

A sandbox is the isolated environment your agent runs inside. Each one gets its own filesystem, its own private Docker network, its own backing services, and its own audit log. Nothing leaks between sandboxes.

This page covers the sandbox SDK — the methods you call to manage one directly. For the spec fields that describe a sandbox at create time, see Spec Reference.

Lifecycle

creating → ready → running → stopped
                              error (on failure)
  • creating — runtime is booting, services starting, fixtures applying. Not callable yet.
  • ready — services are up, fixtures seeded, snapshot captured. Agent can connect.
  • running — first command has been issued. Stays in this state until destroyed or timeout.
  • stopped — destroyed cleanly.
  • error — boot failed (service didn't come up, fixture errored, etc.).

A sandbox auto-destroys when its Timeout (from resources.timeout, default 10 min) expires. Calling destroy() short-circuits the timer and runs teardown immediately.

sandboxes.create(opts)

Creates a fully initialized sandbox from a spec. The full boot pipeline:

  1. Resolve secrets (Dashboard-stored + spec-declared, spec wins on collision).
  2. Start the isolation runtime (Firecracker VM, Docker container, or Nomad alloc).
  3. Start backing services on a private Docker network keystone-<id>.
  4. Apply fixtures (git clone, SQL seeds, directory copies, drift injection).
  5. Apply setup (files, commands, env).
  6. Configure network policy + determinism.
  7. Start audit capture.
  8. Take the before-run snapshot.
  9. Mark ready.
const sb = await ks.sandboxes.create({
  spec_id: "fix-failing-test",          // required — must match a spec id
  timeout: "10m",                        // optional auto-destroy timer
  metadata: { run: "ci-7821" },          // optional key-value tags
});
 
// Returns:
// {
//   id: "sb-a1b2c3d4e5f6",
//   spec_id: "fix-failing-test",
//   state: "creating" | "ready" | "running" | "stopped" | "error",
//   path: "/var/keystone/workspaces/sb-a1b2...",
//   url: "https://keystone.polarity.so/v1/sandboxes/sb-a1b2...",
//   created_at: "2026-04-28T22:00:00Z",
//   metadata: { run: "ci-7821" },
//   services: {
//     db: { host: "db", port: 5432, ready: true },
//     cache: { host: "cache", port: 6379, ready: true },
//   },
// }

What this does: posts the JSON body to POST /v1/sandboxes. The server resolves the spec, boots the runtime, starts services, runs fixtures, applies setup, and returns the sandbox object once it's ready. Failure of any step destroys partial state and returns an error.

Auto-forwarding secrets

If you pass specPath (TS) / spec_path (Python), the SDK reads your spec's secrets: block, resolves each declared source on the caller's machine, and forwards the resolved {name: value} map alongside the create request. Server-side these merge with Dashboard-stored secrets — the request values win.

await ks.sandboxes.create({ spec_id: "...", specPath: "./specs/scenario-1.yaml" });

See Secrets for the full source-type table.

sandboxes.get(id)

Fetches a sandbox by ID. The same shape as the create response.

const sb = await ks.sandboxes.get("sb-a1b2c3d4e5f6");
console.log(sb.state);   // "creating" | "ready" | "running" | "stopped" | "error"

What this does: GET /v1/sandboxes/:id. Returns 404 if the ID isn't known to the server.

sandboxes.list()

Returns every active sandbox visible to the caller's API key.

const all = await ks.sandboxes.list();
// Sandbox[]

What this does: GET /v1/sandboxes. Returns the array of sandboxes scoped to the API key's billing owner — you cannot see another tenant's sandboxes. Stale stopped entries are filtered out.

sandboxes.destroy(id)

Tears down a sandbox immediately: runs teardown exports, closes the audit log, stops backing services, halts the isolation runtime, kills any straggler processes, removes the workspace directory.

await ks.sandboxes.destroy("sb-a1b2c3d4e5f6");

What this does: DELETE /v1/sandboxes/:id. Idempotent — calling on an already-stopped sandbox is fine. Teardown runs even on failed sandboxes when teardown.always_run: true.

Running commands

sandboxes.runCommand(id, opts)

Executes a shell command inside the sandbox. Captures stdout, stderr, exit code, and duration. Records the command in the audit log automatically.

const result = await ks.sandboxes.runCommand("sb-abc", {
  command: "npm test",
  timeout: "2m",       // optional — kills the command after this
  background: false,    // optional — true = fire-and-forget
});
 
// Returns:
// {
//   command: "npm test",
//   stdout: "PASS  src/api.test.ts\n...",
//   stderr: "",
//   exit_code: 0,
//   duration_ms: 12340,
// }

What this does: POST /v1/sandboxes/:id/commands. The command runs as sh -c "<command>" from the workspace directory with secrets and determinism env vars injected. Records audit.process_spawn and (if audit.stdout_capture: true) the truncated stdout. Triggers a checkpoint snapshot if snapshots.checkpoints: per_action is set.

File operations

The SDK surfaces three file primitives. All three normalize paths — /workspace/foo.txt and foo.txt resolve to the same place.

sandboxes.readFile(id, path)

const content = await ks.sandboxes.readFile("sb-abc", "src/main.ts");
// Returns the file content as a string.

What this does: GET /v1/sandboxes/:id/files/:path. Records an audit.file_read event with the byte count.

sandboxes.writeFile(id, path, content)

await ks.sandboxes.writeFile("sb-abc", "config.json", '{"x": 1}');

What this does: POST /v1/sandboxes/:id/files. Creates parent directories as needed, writes the file with mode 0644, records audit.file_write. Path traversal (..) is rejected.

sandboxes.deleteFile(id, path)

await ks.sandboxes.deleteFile("sb-abc", "tmp/cache.bin");

What this does: DELETE /v1/sandboxes/:id/files/:path. Records audit.file_delete.

State & diffing

sandboxes.state(id)

Captures the current filesystem state — every file, with size, mode, and SHA-256 checksum. Skips node_modules, .git, __pycache__, .keystone. Files larger than 1 MiB get their checksum omitted (size + mode still tracked).

const snap = await ks.sandboxes.state("sb-abc");
// {
//   captured_at: "2026-04-28T22:00:00Z",
//   files: {
//     "src/main.ts":   { size: 1234, mode: "-rw-r--r--", checksum: "a1b2c3..." },
//     "package.json":  { size:  890, mode: "-rw-r--r--", checksum: "d4e5f6..." },
//   }
// }

What this does: GET /v1/sandboxes/:id/state. Useful when you want a portable, hashed view of the sandbox to compare or archive.

sandboxes.diff(id)

Returns the difference between the before-run snapshot (taken automatically when the sandbox went ready) and the current state — i.e., everything the agent changed.

const diff = await ks.sandboxes.diff("sb-abc");
// {
//   added:    ["src/utils.ts", "config.json"],
//   modified: ["src/main.ts"],
//   removed:  ["src/old.ts"],
// }

What this does: GET /v1/sandboxes/:id/diff. Compares checksums between baseline and current. Output is structurally identical to a git diff --name-status summary.

Trace ingestion

sandboxes.ingestTrace(id, events)

Posts trace events to a sandbox. The SDK's wrap() and traced() helpers do this for you automatically — call this directly only if you want to hand-author events from a non-Keystone framework.

await ks.sandboxes.ingestTrace("sb-abc", [
  {
    ts: new Date().toISOString(),
    event_type: "tool_call",
    tool: "write_file",
    phase: "end",
    status: "ok",
    duration_ms: 120,
  },
]);
// Returns: { ingested: 1 }

What this does: POST /v1/sandboxes/:id/trace. Server validates each event, stamps it with the sandbox's tenant info, and stores it in the trace table. Up to 1,000 events per request; larger batches are split client-side.

sandboxes.getTrace(id)

Reads back trace events plus computed metrics for one sandbox.

const trace = await ks.sandboxes.getTrace("sb-abc");
// {
//   events: TraceEvent[],
//   metrics: {
//     total_tool_calls: 12,
//     tool_success_rate: 1.0,
//     mean_duration_ms: 230,
//     p95_duration_ms: 1100,
//     tool_breakdown: { write_file: { count: 5, mean_ms: 120, error_rate: 0 } }
//   }
// }

What this does: GET /v1/sandboxes/:id/trace. Includes both LLM call events (cost + tokens) and tool spans (duration + status). Use this to verify your wrapped agent is actually emitting events.

Real-time events (SSE)

For dashboards, progress indicators, or live debugging, stream lifecycle events as they happen:

GET /v1/sandboxes/:id/events
Accept: text/event-stream

Events include status changes (creating, ready, running, destroyed), service startup, fixture application, command execution, and warnings. There's no SDK helper — just use the standard EventSource API:

const es = new EventSource(`${baseUrl}/v1/sandboxes/sb-abc/events`);
es.onmessage = (e) => {
  const event = JSON.parse(e.data);
  console.log(`[${event.event_type}]`, event.data);
};
es.onerror = () => es.close();

The stream closes when the sandbox is destroyed.

Inside the sandbox: Keystone.fromSandbox()

When your agent runs inside a Keystone sandbox, it can introspect its own environment with one call. This is the one place where the SDK does setup for you instead of with you:

const { client, sandbox } = await Keystone.fromSandbox();
 
// client: a fully-configured Keystone instance, scoped to this sandbox
// sandbox.services.db:   { host: "db", port: 5432, ready: true }
// sandbox.services.cache: { host: "cache", port: 6379, ready: true }
 
const db = new pg.Client({
  host: sandbox.services!.db.host,
  port: sandbox.services!.db.port,
  password: process.env.DB_PASSWORD,
});

What this does: reads KEYSTONE_BASE_URL, KEYSTONE_API_KEY, and KEYSTONE_SANDBOX_ID from the env (Keystone injects all three at sandbox boot), constructs a client, and calls sandboxes.get() for the current sandbox.

The injected KEYSTONE_API_KEY is a sandbox-scoped token (format ks_sb_<hex>) — authorized only for this sandbox's resources. Safe to log; can't read other tenants. This is why agent code never needs your ks_live_ key.

Service discovery env vars

Keystone also injects per-service env vars at boot, so agents can use discovery instead of hardcoded names:

KEYSTONE_SERVICE_DB_HOST=db
KEYSTONE_SERVICE_DB_PORT=5432
KEYSTONE_SERVICE_CACHE_HOST=cache
KEYSTONE_SERVICE_CACHE_PORT=6379

The naming is KEYSTONE_SERVICE_<NAME>_HOST / _PORT where <NAME> is the spec's services[].name upper-snake-cased (so name: http-mockKEYSTONE_SERVICE_HTTP_MOCK_HOST).

Path normalization

The SDK accepts two path conventions interchangeably:

  • Workspace-relativesrc/main.ts (what most callers use).
  • Container-absolute/workspace/src/main.ts (what your agent sees inside the sandbox).

Both resolve to the same physical path on the host. The server strips a leading /workspace/ if present. Path traversal (..) is rejected on every file API.

Errors

All SDK methods raise KeystoneError (TS/Python) or return *APIError (Go) on non-2xx responses:

StatusMeaningCommon cause
400Bad requestMalformed body, unknown spec id, validation failure
401UnauthorizedMissing/invalid API key
403ForbiddenAPI key doesn't own this resource
404Not foundSandbox/spec ID doesn't exist
409ConflictSandbox in wrong state for the operation
429Rate limitedTenant exceeded concurrent-sandbox quota
500Server errorBug or infrastructure failure
503CapacityServer has no free sandbox slots — retry with backoff

The error message includes a human-readable reason. For 503s in particular, a short retry loop with jitter is the right move.