REST API
Every Keystone HTTP endpoint — what it does, what it accepts, what it returns.
The Keystone server exposes a REST API at https://keystone.polarity.so (or wherever your self-hosted deployment lives). All endpoints are versioned under /v1/. The SDKs are thin wrappers around these endpoints — anything the SDKs do, you can do with curl and jq.
Authentication
Every request requires Authorization: Bearer <api-key>:
curl https://keystone.polarity.so/v1/sandboxes \
-H "Authorization: Bearer ks_live_..."| Key prefix | Means |
|---|---|
ks_live_ | Tenant-issued production key |
ks_sb_ | Sandbox-scoped token (auto-injected into agent processes) |
Get a ks_live_ key from app.paragon.run/app/keystone/settings → API Keys.
Errors
Non-2xx responses include a JSON body:
{
"message": "spec 'fix-failing-test' not found",
"error": "not_found"
}Common status codes:
| Code | Meaning |
|---|---|
| 400 | Bad request (malformed body, validation failure) |
| 401 | Missing/invalid API key |
| 403 | API key doesn't own this resource |
| 404 | Resource doesn't exist |
| 409 | Resource in wrong state |
| 429 | Rate limited |
| 500 | Server error |
| 503 | At capacity (retry with backoff) |
Sandboxes
POST /v1/sandboxes Create from spec
GET /v1/sandboxes List active
GET /v1/sandboxes/:id Get details
DELETE /v1/sandboxes/:id Destroy
POST /v1/sandboxes
{
"spec_id": "fix-failing-test",
"timeout": "10m",
"metadata": { "run": "ci-7821" },
"secrets": {
"ANTHROPIC_API_KEY": "..."
}
}Response: full Sandbox object once state == "ready". Boot pipeline runs server-side: secrets resolved, runtime started, services up, fixtures applied, snapshot taken.
Sandbox interaction
POST /v1/sandboxes/:id/commands Run shell command
POST /v1/sandboxes/:id/files Write file
GET /v1/sandboxes/:id/files/*path Read file
DELETE /v1/sandboxes/:id/files/*path Delete file
GET /v1/sandboxes/:id/state Filesystem snapshot (files + checksums)
GET /v1/sandboxes/:id/diff What changed since boot
GET /v1/sandboxes/:id/trace Get trace events
POST /v1/sandboxes/:id/trace Ingest trace events
GET /v1/sandboxes/:id/events Stream lifecycle events (SSE)
POST /v1/sandboxes/:id/timeout Extend/reset timeout
POST /v1/sandboxes/:id/commands
{
"command": "npm test",
"timeout": "2m",
"background": false
}Response:
{
"command": "npm test",
"stdout": "...",
"stderr": "...",
"exit_code": 0,
"duration_ms": 12340
}Records audit.process_spawn and (if audit.stdout_capture: true) the captured stdout.
POST /v1/sandboxes/:id/files / GET /v1/sandboxes/:id/files/*path
Write:
{ "path": "config.json", "content": "..." }Read (path is in URL):
curl https://.../v1/sandboxes/sb-abc/files/config.json
# Returns the file content as raw body.Path normalization: /workspace/foo.txt and foo.txt resolve to the same place. Path traversal (..) is rejected.
GET /v1/sandboxes/:id/events
Server-Sent Events. Streams sandbox lifecycle:
data: {"event_type": "status", "data": {"step": "secrets_resolved", "count": "3"}}
data: {"event_type": "status", "data": {"step": "container_started", "runtime": "docker"}}
data: {"event_type": "command", "data": {"command": "npm test", "exit_code": 0, "duration_ms": 12340}}
data: {"event_type": "status", "data": {"state": "destroyed"}}
Stream closes when the sandbox is destroyed.
Specs
POST /v1/specs Upload YAML
GET /v1/specs List
GET /v1/specs/:id Get
DELETE /v1/specs/:id Delete
POST /v1/specs
Body is raw YAML; content type application/x-yaml or text/yaml. Versioning is automatic — uploading the same id: increments the version.
Experiments
POST /v1/experiments Create
GET /v1/experiments List
GET /v1/experiments/:id Get RunResults
POST /v1/experiments/:id/run Trigger async run
POST /v1/experiments/compare Compare two experiments
GET /v1/metrics/experiments/:id Get aggregate metrics
POST /v1/experiments
{
"name": "baseline-v1",
"spec_id": "fix-failing-test",
"secrets": { "ANTHROPIC_API_KEY": "..." }
}Returns:
{
"id": "exp-a1b2c3...",
"name": "baseline-v1",
"spec_id": "fix-failing-test",
"status": "created",
"created_at": "..."
}POST /v1/experiments/:id/run
Empty body. Server enqueues scenario jobs (replicas × matrix entries) and returns immediately. Poll GET /v1/experiments/:id for completion.
POST /v1/experiments/compare
{ "baseline_id": "exp-baseline", "candidate_id": "exp-new" }Returns:
{
"baseline_id": "exp-baseline",
"candidate_id": "exp-new",
"regressed": true,
"regressions": ["pass_rate dropped from 0.95 to 0.78"],
"metrics": [
{ "name": "pass_rate", "baseline": 0.95, "candidate": 0.78, "delta": -0.17, "direction": "worse" }
]
}Alerts
POST /v1/alerts Create rule
GET /v1/alerts List
DELETE /v1/alerts/:id Delete
POST /v1/alerts
{
"name": "pass-rate-drop",
"eval_id": "fix-failing-test",
"condition": "pass_rate < 0.8",
"notify": "slack",
"slack_channel": "#agent-alerts"
}Or webhook:
{
"name": "cost-spike",
"condition": "mean_cost_per_run_usd > 2.00",
"notify": "webhook",
"webhook_url": "https://hooks.slack.com/services/T00/B00/xxx"
}Agents (snapshots)
POST /v1/agents Upload (multipart: metadata + bundle)
GET /v1/agents List (paginated)
GET /v1/agents/:name/latest Resolve latest version
GET /v1/agents/:name/tags/:tag Resolve by tag
GET /v1/agents/:name/versions/:v Resolve by version number
GET /v1/agents/:name/versions List versions of one agent
GET /v1/agents/:name/traces Query traces by agent name (with optional ?version=)
GET /v1/snapshots/:id Get snapshot by content hash
DELETE /v1/snapshots/:id Delete snapshot
POST /v1/agents (multipart)
Two form fields:
metadata— JSON:{ name, entrypoint, runtime?, tag?, auth? }bundle— binary: the.tar.gzfile
curl -X POST https://.../v1/agents \
-H "Authorization: Bearer ks_live_..." \
-F 'metadata={"name":"email-agent","entrypoint":["python","main.py"],"runtime":"python3.12"}' \
-F bundle=@dist/email-agent.tar.gzResponse: full AgentSnapshot with auto-assigned version and digest (sha256 content hash).
Datasets
POST /v1/datasets Create
GET /v1/datasets List
GET /v1/datasets/:id Get
DELETE /v1/datasets/:id Delete (+ all records)
POST /v1/datasets/:id/records Add records (auto-increments version)
GET /v1/datasets/:id/records Get records (?version=N&tags=a,b)
Scoring
POST /v1/score-rules Create rule
GET /v1/score-rules List rules
DELETE /v1/score-rules/:id Delete rule
POST /v1/experiments/:id/score Trigger offline scoring
GET /v1/experiments/:id/scores Fetch scores
Export
GET /v1/traces?... Stream trace events (paginated)
GET /v1/traces/:id Single trace
GET /v1/spans?... Stream spans
GET /v1/scenarios?... Stream scenarios
GET /v1/scores?... Stream scores
GET /v1/experiments/:id/export Full bundle (?format=json|ndjson)
All paginated endpoints return:
{
"items": [...],
"next_cursor": "...",
"count": 100
}Pass ?cursor=<next_cursor> to fetch the next page. NDJSON variants stream line-delimited JSON for jq-friendly consumption.
Common filters
| Endpoint | Filters |
|---|---|
/v1/traces | experiment_id, sandbox_id, agent, event_type, tool, since |
/v1/spans | trace_id, span_id, parent_span_id, root_span_id, tool, event_type |
/v1/scenarios | experiment_id (required), status, scenario_id |
/v1/scores | experiment_id (required), rule_id |
All endpoints accept ?limit=<int> (default 100) for page size.
Tracing (agent mode)
POST /v1/traces Ingest trace events (no sandbox)
{
"events": [
{ "ts": "...", "event_type": "llm_call", "tool": "anthropic.create", "phase": "complete", ... }
]
}Events are stamped with the API key's billing owner and stored with sandbox_id = null. The Traces dashboard tab filters by API key.
OTLP ingest
POST /otel/v1/traces OpenTelemetry OTLP endpoint
Accepts the standard OTLP protobuf payload. Spans are converted to Keystone trace events at ingest, with gen_ai.* semantic conventions preserved in metadata.
Configure your OTel exporter:
export OTEL_EXPORTER_OTLP_ENDPOINT=https://keystone.polarity.so
export OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer\ ks_live_...Misc
GET /health Server liveness
GET /v1/traces?experiment_id=:id Trace events for an experiment (filterable by sandbox_id, agent, event_type, tool, since)
GET /v1/traces/:trace_id Single trace — every span rooted at the given trace ID
GET /v1/sandboxes/:id/trace Trace events for a single sandbox
GET /v1/spans?root_span=:sid Filter spans by root_span / parent_span / trace / tool / event_type
GET /v1/metrics/experiments/:id Aggregate metrics for one experiment
GET /v1/evals/:id/history Score history for an eval over time
Rate limits
The server rate-limits per API key:
- 100 requests/sec for read endpoints
- 10 requests/sec for write endpoints
- Concurrent sandbox limit: 5 (Free), 50 (Pro), custom (Enterprise)
429 responses include Retry-After (seconds). 503 responses include the same — implement exponential backoff with jitter.
Idempotency
POST endpoints accept an optional Idempotency-Key header. When set, the server caches the response for 24h — retrying with the same key returns the cached response without re-running the operation. Useful for POST /v1/sandboxes and POST /v1/experiments where network retries shouldn't create duplicates.
curl -X POST https://.../v1/sandboxes \
-H "Authorization: Bearer ks_live_..." \
-H "Idempotency-Key: $(uuidgen)" \
-d '{"spec_id": "..."}'SSE streaming
GET /v1/sandboxes/:id/events is the only Server-Sent Events endpoint today. Other "stream" endpoints (/v1/traces, /v1/spans) use cursor pagination instead, which is easier to consume in clients that don't have native SSE handling.