REST API

Every Keystone HTTP endpoint — what it does, what it accepts, what it returns.

The Keystone server exposes a REST API at https://keystone.polarity.so (or wherever your self-hosted deployment lives). All endpoints are versioned under /v1/. The SDKs are thin wrappers around these endpoints — anything the SDKs do, you can do with curl and jq.

Authentication

Every request requires Authorization: Bearer <api-key>:

curl https://keystone.polarity.so/v1/sandboxes \
  -H "Authorization: Bearer ks_live_..."

Key prefix	Means
`ks_live_`	Tenant-issued production key
`ks_sb_`	Sandbox-scoped token (auto-injected into agent processes)

Get a ks_live_ key from app.paragon.run/app/keystone/settings → API Keys.

Errors

Non-2xx responses include a JSON body:

{
  "message": "spec 'fix-failing-test' not found",
  "error": "not_found"
}

Common status codes:

Code	Meaning
400	Bad request (malformed body, validation failure)
401	Missing/invalid API key
403	API key doesn't own this resource
404	Resource doesn't exist
409	Resource in wrong state
429	Rate limited
500	Server error
503	At capacity (retry with backoff)

Sandboxes

POST   /v1/sandboxes                Create from spec
GET    /v1/sandboxes                List active
GET    /v1/sandboxes/:id            Get details
DELETE /v1/sandboxes/:id            Destroy

`POST /v1/sandboxes`

{
  "spec_id": "fix-failing-test",
  "timeout": "10m",
  "metadata": { "run": "ci-7821" },
  "secrets": {
    "ANTHROPIC_API_KEY": "..."
  }
}

Response: full Sandbox object once state == "ready". Boot pipeline runs server-side: secrets resolved, runtime started, services up, fixtures applied, snapshot taken.

Sandbox interaction

POST   /v1/sandboxes/:id/commands           Run shell command
POST   /v1/sandboxes/:id/files               Write file
GET    /v1/sandboxes/:id/files/*path         Read file
DELETE /v1/sandboxes/:id/files/*path         Delete file
GET    /v1/sandboxes/:id/state               Filesystem snapshot (files + checksums)
GET    /v1/sandboxes/:id/diff                What changed since boot
GET    /v1/sandboxes/:id/trace               Get trace events
POST   /v1/sandboxes/:id/trace               Ingest trace events
GET    /v1/sandboxes/:id/events              Stream lifecycle events (SSE)
POST   /v1/sandboxes/:id/timeout             Extend/reset timeout

`POST /v1/sandboxes/:id/commands`

{
  "command": "npm test",
  "timeout": "2m",
  "background": false
}

Response:

{
  "command": "npm test",
  "stdout": "...",
  "stderr": "...",
  "exit_code": 0,
  "duration_ms": 12340
}

Records audit.process_spawn and (if audit.stdout_capture: true) the captured stdout.

`POST /v1/sandboxes/:id/files` / `GET /v1/sandboxes/:id/files/*path`

Write:

{ "path": "config.json", "content": "..." }

Read (path is in URL):

curl https://.../v1/sandboxes/sb-abc/files/config.json
# Returns the file content as raw body.

Path normalization: /workspace/foo.txt and foo.txt resolve to the same place. Path traversal (..) is rejected.

`GET /v1/sandboxes/:id/events`

Server-Sent Events. Streams sandbox lifecycle:

data: {"event_type": "status", "data": {"step": "secrets_resolved", "count": "3"}}

data: {"event_type": "status", "data": {"step": "container_started", "runtime": "docker"}}

data: {"event_type": "command", "data": {"command": "npm test", "exit_code": 0, "duration_ms": 12340}}

data: {"event_type": "status", "data": {"state": "destroyed"}}

Stream closes when the sandbox is destroyed.

Specs

POST   /v1/specs                    Upload YAML
GET    /v1/specs                    List
GET    /v1/specs/:id                Get
DELETE /v1/specs/:id                Delete

`POST /v1/specs`

Body is raw YAML; content type application/x-yaml or text/yaml. Versioning is automatic — uploading the same id: increments the version.

Experiments

POST   /v1/experiments              Create
GET    /v1/experiments              List
GET    /v1/experiments/:id          Get RunResults
POST   /v1/experiments/:id/run      Trigger async run
POST   /v1/experiments/compare      Compare two experiments
GET    /v1/metrics/experiments/:id  Get aggregate metrics

`POST /v1/experiments`

{
  "name": "baseline-v1",
  "spec_id": "fix-failing-test",
  "secrets": { "ANTHROPIC_API_KEY": "..." }
}

Returns:

{
  "id": "exp-a1b2c3...",
  "name": "baseline-v1",
  "spec_id": "fix-failing-test",
  "status": "created",
  "created_at": "..."
}

`POST /v1/experiments/:id/run`

Empty body. Server enqueues scenario jobs (replicas × matrix entries) and returns immediately. Poll GET /v1/experiments/:id for completion.

`POST /v1/experiments/compare`

{ "baseline_id": "exp-baseline", "candidate_id": "exp-new" }

Returns:

{
  "baseline_id": "exp-baseline",
  "candidate_id": "exp-new",
  "regressed": true,
  "regressions": ["pass_rate dropped from 0.95 to 0.78"],
  "metrics": [
    { "name": "pass_rate", "baseline": 0.95, "candidate": 0.78, "delta": -0.17, "direction": "worse" }
  ]
}

Alerts

POST   /v1/alerts                   Create rule
GET    /v1/alerts                   List
DELETE /v1/alerts/:id               Delete

`POST /v1/alerts`

{
  "name": "pass-rate-drop",
  "eval_id": "fix-failing-test",
  "condition": "pass_rate < 0.8",
  "notify": "slack",
  "slack_channel": "#agent-alerts"
}

Or webhook:

{
  "name": "cost-spike",
  "condition": "mean_cost_per_run_usd > 2.00",
  "notify": "webhook",
  "webhook_url": "https://hooks.slack.com/services/T00/B00/xxx"
}

Agents (snapshots)

POST   /v1/agents                   Upload (multipart: metadata + bundle)
GET    /v1/agents                   List (paginated)
GET    /v1/agents/:name/latest      Resolve latest version
GET    /v1/agents/:name/tags/:tag   Resolve by tag
GET    /v1/agents/:name/versions/:v Resolve by version number
GET    /v1/agents/:name/versions    List versions of one agent
GET    /v1/agents/:name/traces      Query traces by agent name (with optional ?version=)
GET    /v1/snapshots/:id            Get snapshot by content hash
DELETE /v1/snapshots/:id            Delete snapshot

`POST /v1/agents` (multipart)

Two form fields:

metadata — JSON: { name, entrypoint, runtime?, tag?, auth? }
bundle — binary: the .tar.gz file

curl -X POST https://.../v1/agents \
  -H "Authorization: Bearer ks_live_..." \
  -F 'metadata={"name":"email-agent","entrypoint":["python","main.py"],"runtime":"python3.12"}' \
  -F bundle=@dist/email-agent.tar.gz

Response: full AgentSnapshot with auto-assigned version and digest (sha256 content hash).

Datasets

POST   /v1/datasets                          Create
GET    /v1/datasets                          List
GET    /v1/datasets/:id                      Get
DELETE /v1/datasets/:id                      Delete (+ all records)
POST   /v1/datasets/:id/records              Add records (auto-increments version)
GET    /v1/datasets/:id/records              Get records (?version=N&tags=a,b)

Scoring

POST   /v1/score-rules                              Create rule
GET    /v1/score-rules                              List rules
DELETE /v1/score-rules/:id                          Delete rule
POST   /v1/experiments/:id/score                    Trigger offline scoring
GET    /v1/experiments/:id/scores                   Fetch scores

Export

GET    /v1/traces?...                       Stream trace events (paginated)
GET    /v1/traces/:id                       Single trace
GET    /v1/spans?...                        Stream spans
GET    /v1/scenarios?...                    Stream scenarios
GET    /v1/scores?...                       Stream scores
GET    /v1/experiments/:id/export           Full bundle (?format=json|ndjson)

All paginated endpoints return:

{
  "items": [...],
  "next_cursor": "...",
  "count": 100
}

Pass ?cursor=<next_cursor> to fetch the next page. NDJSON variants stream line-delimited JSON for jq-friendly consumption.

Common filters

Endpoint	Filters
`/v1/traces`	`experiment_id`, `sandbox_id`, `agent`, `event_type`, `tool`, `since`
`/v1/spans`	`trace_id`, `span_id`, `parent_span_id`, `root_span_id`, `tool`, `event_type`
`/v1/scenarios`	`experiment_id` (required), `status`, `scenario_id`
`/v1/scores`	`experiment_id` (required), `rule_id`

All endpoints accept ?limit=<int> (default 100) for page size.

Tracing (agent mode)

POST   /v1/traces                   Ingest trace events (no sandbox)

{
  "events": [
    { "ts": "...", "event_type": "llm_call", "tool": "anthropic.create", "phase": "complete", ... }
  ]
}

Events are stamped with the API key's billing owner and stored with sandbox_id = null. The Traces dashboard tab filters by API key.

OTLP ingest

POST   /otel/v1/traces              OpenTelemetry OTLP endpoint

Accepts the standard OTLP protobuf payload. Spans are converted to Keystone trace events at ingest, with gen_ai.* semantic conventions preserved in metadata.

Configure your OTel exporter:

export OTEL_EXPORTER_OTLP_ENDPOINT=https://keystone.polarity.so
export OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer\ ks_live_...

Misc

GET    /health                              Server liveness
GET    /v1/traces?experiment_id=:id         Trace events for an experiment (filterable by sandbox_id, agent, event_type, tool, since)
GET    /v1/traces/:trace_id                 Single trace — every span rooted at the given trace ID
GET    /v1/sandboxes/:id/trace              Trace events for a single sandbox
GET    /v1/spans?root_span=:sid             Filter spans by root_span / parent_span / trace / tool / event_type
GET    /v1/metrics/experiments/:id          Aggregate metrics for one experiment
GET    /v1/evals/:id/history                Score history for an eval over time

Rate limits

The server rate-limits per API key:

100 requests/sec for read endpoints
10 requests/sec for write endpoints
Concurrent sandbox limit: 5 (Free), 50 (Pro), custom (Enterprise)

429 responses include Retry-After (seconds). 503 responses include the same — implement exponential backoff with jitter.

Idempotency

POST endpoints accept an optional Idempotency-Key header. When set, the server caches the response for 24h — retrying with the same key returns the cached response without re-running the operation. Useful for POST /v1/sandboxes and POST /v1/experiments where network retries shouldn't create duplicates.

curl -X POST https://.../v1/sandboxes \
  -H "Authorization: Bearer ks_live_..." \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{"spec_id": "..."}'

SSE streaming

GET /v1/sandboxes/:id/events is the only Server-Sent Events endpoint today. Other "stream" endpoints (/v1/traces, /v1/spans) use cursor pagination instead, which is easier to consume in clients that don't have native SSE handling.

Go CLI (ks)