SDK Reference

REST API

Every Keystone HTTP endpoint — what it does, what it accepts, what it returns.

The Keystone server exposes a REST API at https://keystone.polarity.so (or wherever your self-hosted deployment lives). All endpoints are versioned under /v1/. The SDKs are thin wrappers around these endpoints — anything the SDKs do, you can do with curl and jq.

Authentication

Every request requires Authorization: Bearer <api-key>:

curl https://keystone.polarity.so/v1/sandboxes \
  -H "Authorization: Bearer ks_live_..."
Key prefixMeans
ks_live_Tenant-issued production key
ks_sb_Sandbox-scoped token (auto-injected into agent processes)

Get a ks_live_ key from app.paragon.run/app/keystone/settingsAPI Keys.

Errors

Non-2xx responses include a JSON body:

{
  "message": "spec 'fix-failing-test' not found",
  "error": "not_found"
}

Common status codes:

CodeMeaning
400Bad request (malformed body, validation failure)
401Missing/invalid API key
403API key doesn't own this resource
404Resource doesn't exist
409Resource in wrong state
429Rate limited
500Server error
503At capacity (retry with backoff)

Sandboxes

POST   /v1/sandboxes                Create from spec
GET    /v1/sandboxes                List active
GET    /v1/sandboxes/:id            Get details
DELETE /v1/sandboxes/:id            Destroy

POST /v1/sandboxes

{
  "spec_id": "fix-failing-test",
  "timeout": "10m",
  "metadata": { "run": "ci-7821" },
  "secrets": {
    "ANTHROPIC_API_KEY": "..."
  }
}

Response: full Sandbox object once state == "ready". Boot pipeline runs server-side: secrets resolved, runtime started, services up, fixtures applied, snapshot taken.

Sandbox interaction

POST   /v1/sandboxes/:id/commands           Run shell command
POST   /v1/sandboxes/:id/files               Write file
GET    /v1/sandboxes/:id/files/*path         Read file
DELETE /v1/sandboxes/:id/files/*path         Delete file
GET    /v1/sandboxes/:id/state               Filesystem snapshot (files + checksums)
GET    /v1/sandboxes/:id/diff                What changed since boot
GET    /v1/sandboxes/:id/trace               Get trace events
POST   /v1/sandboxes/:id/trace               Ingest trace events
GET    /v1/sandboxes/:id/events              Stream lifecycle events (SSE)
POST   /v1/sandboxes/:id/timeout             Extend/reset timeout

POST /v1/sandboxes/:id/commands

{
  "command": "npm test",
  "timeout": "2m",
  "background": false
}

Response:

{
  "command": "npm test",
  "stdout": "...",
  "stderr": "...",
  "exit_code": 0,
  "duration_ms": 12340
}

Records audit.process_spawn and (if audit.stdout_capture: true) the captured stdout.

POST /v1/sandboxes/:id/files / GET /v1/sandboxes/:id/files/*path

Write:

{ "path": "config.json", "content": "..." }

Read (path is in URL):

curl https://.../v1/sandboxes/sb-abc/files/config.json
# Returns the file content as raw body.

Path normalization: /workspace/foo.txt and foo.txt resolve to the same place. Path traversal (..) is rejected.

GET /v1/sandboxes/:id/events

Server-Sent Events. Streams sandbox lifecycle:

data: {"event_type": "status", "data": {"step": "secrets_resolved", "count": "3"}}

data: {"event_type": "status", "data": {"step": "container_started", "runtime": "docker"}}

data: {"event_type": "command", "data": {"command": "npm test", "exit_code": 0, "duration_ms": 12340}}

data: {"event_type": "status", "data": {"state": "destroyed"}}

Stream closes when the sandbox is destroyed.

Specs

POST   /v1/specs                    Upload YAML
GET    /v1/specs                    List
GET    /v1/specs/:id                Get
DELETE /v1/specs/:id                Delete

POST /v1/specs

Body is raw YAML; content type application/x-yaml or text/yaml. Versioning is automatic — uploading the same id: increments the version.

Experiments

POST   /v1/experiments              Create
GET    /v1/experiments              List
GET    /v1/experiments/:id          Get RunResults
POST   /v1/experiments/:id/run      Trigger async run
POST   /v1/experiments/compare      Compare two experiments
GET    /v1/metrics/experiments/:id  Get aggregate metrics

POST /v1/experiments

{
  "name": "baseline-v1",
  "spec_id": "fix-failing-test",
  "secrets": { "ANTHROPIC_API_KEY": "..." }
}

Returns:

{
  "id": "exp-a1b2c3...",
  "name": "baseline-v1",
  "spec_id": "fix-failing-test",
  "status": "created",
  "created_at": "..."
}

POST /v1/experiments/:id/run

Empty body. Server enqueues scenario jobs (replicas × matrix entries) and returns immediately. Poll GET /v1/experiments/:id for completion.

POST /v1/experiments/compare

{ "baseline_id": "exp-baseline", "candidate_id": "exp-new" }

Returns:

{
  "baseline_id": "exp-baseline",
  "candidate_id": "exp-new",
  "regressed": true,
  "regressions": ["pass_rate dropped from 0.95 to 0.78"],
  "metrics": [
    { "name": "pass_rate", "baseline": 0.95, "candidate": 0.78, "delta": -0.17, "direction": "worse" }
  ]
}

Alerts

POST   /v1/alerts                   Create rule
GET    /v1/alerts                   List
DELETE /v1/alerts/:id               Delete

POST /v1/alerts

{
  "name": "pass-rate-drop",
  "eval_id": "fix-failing-test",
  "condition": "pass_rate < 0.8",
  "notify": "slack",
  "slack_channel": "#agent-alerts"
}

Or webhook:

{
  "name": "cost-spike",
  "condition": "mean_cost_per_run_usd > 2.00",
  "notify": "webhook",
  "webhook_url": "https://hooks.slack.com/services/T00/B00/xxx"
}

Agents (snapshots)

POST   /v1/agents                   Upload (multipart: metadata + bundle)
GET    /v1/agents                   List (paginated)
GET    /v1/agents/:name/latest      Resolve latest version
GET    /v1/agents/:name/tags/:tag   Resolve by tag
GET    /v1/agents/:name/versions/:v Resolve by version number
GET    /v1/agents/:name/versions    List versions of one agent
GET    /v1/agents/:name/traces      Query traces by agent name (with optional ?version=)
GET    /v1/snapshots/:id            Get snapshot by content hash
DELETE /v1/snapshots/:id            Delete snapshot

POST /v1/agents (multipart)

Two form fields:

  • metadata — JSON: { name, entrypoint, runtime?, tag?, auth? }
  • bundle — binary: the .tar.gz file
curl -X POST https://.../v1/agents \
  -H "Authorization: Bearer ks_live_..." \
  -F 'metadata={"name":"email-agent","entrypoint":["python","main.py"],"runtime":"python3.12"}' \
  -F bundle=@dist/email-agent.tar.gz

Response: full AgentSnapshot with auto-assigned version and digest (sha256 content hash).

Datasets

POST   /v1/datasets                          Create
GET    /v1/datasets                          List
GET    /v1/datasets/:id                      Get
DELETE /v1/datasets/:id                      Delete (+ all records)
POST   /v1/datasets/:id/records              Add records (auto-increments version)
GET    /v1/datasets/:id/records              Get records (?version=N&tags=a,b)

Scoring

POST   /v1/score-rules                              Create rule
GET    /v1/score-rules                              List rules
DELETE /v1/score-rules/:id                          Delete rule
POST   /v1/experiments/:id/score                    Trigger offline scoring
GET    /v1/experiments/:id/scores                   Fetch scores

Export

GET    /v1/traces?...                       Stream trace events (paginated)
GET    /v1/traces/:id                       Single trace
GET    /v1/spans?...                        Stream spans
GET    /v1/scenarios?...                    Stream scenarios
GET    /v1/scores?...                       Stream scores
GET    /v1/experiments/:id/export           Full bundle (?format=json|ndjson)

All paginated endpoints return:

{
  "items": [...],
  "next_cursor": "...",
  "count": 100
}

Pass ?cursor=<next_cursor> to fetch the next page. NDJSON variants stream line-delimited JSON for jq-friendly consumption.

Common filters

EndpointFilters
/v1/tracesexperiment_id, sandbox_id, agent, event_type, tool, since
/v1/spanstrace_id, span_id, parent_span_id, root_span_id, tool, event_type
/v1/scenariosexperiment_id (required), status, scenario_id
/v1/scoresexperiment_id (required), rule_id

All endpoints accept ?limit=<int> (default 100) for page size.

Tracing (agent mode)

POST   /v1/traces                   Ingest trace events (no sandbox)
{
  "events": [
    { "ts": "...", "event_type": "llm_call", "tool": "anthropic.create", "phase": "complete", ... }
  ]
}

Events are stamped with the API key's billing owner and stored with sandbox_id = null. The Traces dashboard tab filters by API key.

OTLP ingest

POST   /otel/v1/traces              OpenTelemetry OTLP endpoint

Accepts the standard OTLP protobuf payload. Spans are converted to Keystone trace events at ingest, with gen_ai.* semantic conventions preserved in metadata.

Configure your OTel exporter:

export OTEL_EXPORTER_OTLP_ENDPOINT=https://keystone.polarity.so
export OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer\ ks_live_...

Misc

GET    /health                              Server liveness
GET    /v1/traces?experiment_id=:id         Trace events for an experiment (filterable by sandbox_id, agent, event_type, tool, since)
GET    /v1/traces/:trace_id                 Single trace — every span rooted at the given trace ID
GET    /v1/sandboxes/:id/trace              Trace events for a single sandbox
GET    /v1/spans?root_span=:sid             Filter spans by root_span / parent_span / trace / tool / event_type
GET    /v1/metrics/experiments/:id          Aggregate metrics for one experiment
GET    /v1/evals/:id/history                Score history for an eval over time

Rate limits

The server rate-limits per API key:

  • 100 requests/sec for read endpoints
  • 10 requests/sec for write endpoints
  • Concurrent sandbox limit: 5 (Free), 50 (Pro), custom (Enterprise)

429 responses include Retry-After (seconds). 503 responses include the same — implement exponential backoff with jitter.

Idempotency

POST endpoints accept an optional Idempotency-Key header. When set, the server caches the response for 24h — retrying with the same key returns the cached response without re-running the operation. Useful for POST /v1/sandboxes and POST /v1/experiments where network retries shouldn't create duplicates.

curl -X POST https://.../v1/sandboxes \
  -H "Authorization: Bearer ks_live_..." \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{"spec_id": "..."}'

SSE streaming

GET /v1/sandboxes/:id/events is the only Server-Sent Events endpoint today. Other "stream" endpoints (/v1/traces, /v1/spans) use cursor pagination instead, which is easier to consume in clients that don't have native SSE handling.