Troubleshooting
Common Keystone errors, what they mean, and how to fix them.
The most common errors users hit, what they actually mean under the hood, and the right way to fix them.
SDK / API errors
unauthorized / 401
Your API key is missing, invalid, or rotated.
Check:
KEYSTONE_API_KEYis set in your environment, or you passedapiKey:to the constructor.- The key starts with
ks_live_and is the value you copied at creation time (keys are shown once — if you misplaced it, generate a new one at app.paragon.run/app/keystone/settings → API Keys). - The key wasn't rotated. Old keys stop working immediately when you rotate.
Run ks setup doctor for a quick health check.
forbidden / 403
The API key is valid, but it doesn't own the resource you're asking about. Most common cause: trying to access another tenant's sandbox or spec.
Check:
- The resource ID is yours (look at it in your Dashboard).
- You're using the right API key for the right billing owner.
not found / 404
The resource ID doesn't exist on the server.
For spec_id errors during experiments.create(): you need to upload the spec first via ks.specs.create(yaml). The SDK doesn't auto-upload; the spec must already exist on the server before you reference it.
For sandbox_id: the sandbox might've already been destroyed (auto-timeout, manual destroy, or boot failure).
sandbox rejected: at capacity / 503
Your tenant has hit its concurrent-sandbox limit.
Fix:
- Wait for current experiments to complete.
- Set
resources.concurrency_limit:in your spec to throttle parallelism. - Contact support to raise the tenant limit (Pro/Enterprise).
Implement retry with exponential backoff:
async function createWithRetry(opts, attempt = 1) {
try {
return await ks.sandboxes.create(opts);
} catch (err) {
if (err.statusCode === 503 && attempt < 5) {
await sleep(1000 * 2 ** attempt + Math.random() * 1000);
return createWithRetry(opts, attempt + 1);
}
throw err;
}
}rate limited / 429
You're hammering the API faster than the rate limit. Server returns Retry-After header.
Fix: sleep for Retry-After seconds and retry, or batch your requests.
Sandbox boot failures
Sandbox transitions to error state during creation
Read the audit log or stream /v1/sandboxes/:id/events to see which step failed.
Common causes:
| Cause | Symptom | Fix |
|---|---|---|
Service wait_for timed out | "service db not ready" | Wrong wait command, or service genuinely failed to start. Check the image's docs for the right wait_for. |
| Image pull failed | "image <name> not found" | Registry doesn't have it (typo?), or server can't reach the registry. |
| Fixture failed | "applying fixtures: ..." | Usually a SQL syntax error in inline sql:. Run the SQL against a local Postgres first. |
| Missing required secret | "resolving secrets: ANTHROPIC_API_KEY not set" | The spec declares secrets: but the value isn't in your env or the Dashboard. |
| Setup command failed | "applying setup: command 'npm ci' exited 1" | Setup commands run before the agent — they're for environment prep. Check .env files, package-lock conflicts. |
"spec not found" right after upload
Race condition: some clients upload a spec and immediately try to create an experiment against it. The upload is sync — by the time specs.create() returns, the spec is queryable. If you see this, you're probably using two different Keystone instances pointing at different base URLs. Check your config.
Invariant failures
file does not exist but the agent should have created it
The invariant runs in the sandbox workspace directory. Make sure your agent writes to a relative path, not an absolute one.
invariants:
output_created:
check:
type: file_exists
path: output.json # workspace-relativeIf your agent writes to /tmp/output.json, the invariant looking for output.json won't find it. Either rewrite the agent or change the invariant path to tmp/output.json.
command_exit always shows the exit code as 0 but the test failed
Your command's wrapper might be swallowing the real exit code. Common gotcha:
"command": "cd subdir && npm test" # cd runs first; npm test's failure is hidden by &&'s short-circuitUse sh -c directly:
check:
type: command_exit
command: "sh -c 'cd subdir && npm test'"Or just make sure the last command in the chain is the one whose exit code you care about.
llm_as_judge fails with judge call failed
The judge model failed (rate limit, network error, malformed response). Different from a score: 0 — this is an infrastructure failure.
Fix: retry the experiment. If it persists, check whether your judge model is overloaded; switch to paragon-fast if you were on paragon-max.
http_mock_assertions fails — mock didn't see the request
Your agent is hitting the real API instead of the mock. Check:
network.dns_overrides:— does the real domain redirect to the mock?network.egress.allow:— is the real domain accidentally in the allowlist?- The agent's HTTP client — it might be ignoring DNS overrides (some clients cache or use IP literals).
Stream the audit log:
ks logs traces $EXP_ID --event-type http_callYou'll see the actual host the agent hit. If it's api.stripe.com instead of stripe-mock, your DNS overrides aren't working.
Agent runtime issues
Agent times out
Default agent timeout is 5 minutes. For long tasks, raise it:
agent:
timeout: 15mAlso raise the sandbox-level timeout to cover setup + agent + scoring:
resources:
timeout: 20mIf the timeout is the right size and the agent still hangs, look at the trace events to see where it stalls. Usually it's an LLM call waiting on a slow provider, or a tool waiting on a service that didn't come up.
Agent runs but produces no traces
Check:
- Did you call
ks.wrap()orks.observe()? Without one, the LLM client emits nothing. - Is
KEYSTONE_API_KEYinjected? Inside a sandbox, Keystone auto-injectsKEYSTONE_SANDBOX_IDand a sandbox-scoped token. If your agent runs the wrap before entering the sandbox (which doesn't happen normally), the env var isn't set. - Did your wrap target the right object?
ks.wrap(new Anthropic())works because the SDK detects the shape; if you wrapped a custom subclass that doesn't exposemessages.create(), the wrap silently no-ops.
// Verify the wrap took effect
const trace = await ks.sandboxes.getTrace(process.env.KEYSTONE_SANDBOX_ID!);
console.log(`${trace.events.length} events captured`);Agent reports cost but the dashboard shows $0
Cost is computed from the model name in the response — if the model isn't in the bundled pricing table, cost defaults to 0.
Verify the model is in the table:
import { pricingTable } from "@polarityinc/polarity-keystone";
console.log(pricingTable()); // returns Readonly<Record<model, {input, output}>>from polarity_keystone import pricing_table
print(pricing_table()) # returns a read-only MappingThe pricing table is read-only and bundled with the SDK; new models land in the next release. If you need cost tracking for a private/preview model that isn't listed, file an issue at github.com/Polarityinc/ks with the model name and per-million-token pricing.
Secrets
resolving secrets failure
The SDK couldn't resolve a declared secret. Check:
- For
source: env— is the env var actually set on the caller's machine? - For
source: env:OTHER_NAME— is$OTHER_NAMEset (not$NAME)? - For
source: file:...— does the file exist? Is~/correctly expanded? - For
source: command:...— does the command exit 0? Try running it manually first. - For
source: dashboard— is the secret added in the Dashboard's Secrets tab?
The SDK skips silently for command: failures (you'll see no value forwarded) but the server fails hard at sandbox boot if the resulting value is missing — there's no silent fallback.
secrets_in_logs fails on every run
Your agent is printing secret values to stdout. Common causes:
- Debug logging that prints
process.env. - An LLM that echoes its API key back in a tool argument (yes, this happens).
- A library that logs the request URL with the API key as a query parameter.
Search the audit log for the offending line:
ks logs traces $EXP_ID --event-type stdout | grep -i 'api_key\|secret\|token'Fix in your agent code; don't disable the rule.
CLI issues
.env is ignored
ks only loads .env and .env.local from the current directory, not parent directories. Run from your project root.
The CLI also doesn't override existing env vars — if KEYSTONE_API_KEY is already exported in your shell, that wins.
ks update says "up to date" but you're on an old version
Auto-update is cached for 24h. Force a refresh:
ks update --forceIf ks update can't write to the binary's location (read-only filesystem, sudo-installed /usr/local/bin/ks, etc.), re-run the installer:
curl -fsSL https://ks.polarity.so/install.sh | bashks setup mcp doesn't show up in my coding agent
Check the agent's MCP config file (.mcp.json, .cursor/mcp.json, etc.). The setup phase appends a server entry — restart the agent so it picks up the new config.
If ks setup mcp already wrote the config but the agent doesn't see it, your agent might be looking at a different path (some look in ~/.config/<agent>/mcp.json). Run ks mcp serve manually to verify the binary works, then point your agent at it.
Dashboard / billing
Dashboard shows "no traces" for an experiment that just completed
Two things to check:
- Are you looking at the right tab? Experiment details show invariants and per-scenario detail; the Traces tab shows agent-mode rows scoped by API key.
- Did the agent actually emit traces? A spec with no
wrap()calls can pass invariants without producing any trace events.
Run ks logs traces $EXP_ID to see the raw events. If empty, the wrap didn't take effect.
Cost is much higher than expected
A common cause is forgetting to set temperature: 0 in your scenarios. Non-deterministic generation can cost 2-3× more than deterministic, because the agent retries on flaky outputs.
Other causes:
- Long context windows (every cache miss is full-priced input tokens).
- A judge model running on every replica when you have
replicas: 50. - An infinite tool-call loop (check
mean_tool_calls— anything over ~30 is suspicious).
The Cost trend in experiments.metrics() shows per-run cost over time — a sudden jump usually means a bug.
Self-hosted issues
The Keystone daemon (keystone) is configured via flags + env vars, not a config file. The full flag set is in keystone --help; the most common ones:
keystone -port 8012 \
-host 127.0.0.1 \
-workspace-dir /var/keystone-workspaces \
-runtime auto # auto | local | docker | firecracker | nomad
Server won't start
Check that the chosen runtime is reachable. The daemon auto-detects (-runtime auto), but you can pin it: -runtime docker requires docker on $PATH; -runtime nomad requires the NOMAD_ADDR env var (or default http://127.0.0.1:4646); -runtime firecracker requires -fc-binary, -fc-kernel, -fc-rootfs paths.
API-key authentication is on by default. Set KEYSTONE_API_KEY=… (or pass -api-key …) before starting; the server refuses connections without one. Override only for trusted local dev with KEYSTONE_AUTH_REQUIRED=false.
The Supabase-backed metadata store needs SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY in the environment.
Sandboxes fail to boot
Pick the right runtime for your host:
- Docker is the most common; install Docker Desktop or Docker Engine, then
-runtime docker. - Podman is supported as a Docker drop-in (the daemon auto-detects when
dockerisn't on$PATHbutpodmanis). - Firecracker is the production runtime — fastest cold start, strongest isolation. Requires the rootfs + kernel images plus the Firecracker binary; see the deploy script for the install flow.
- Nomad dispatches to a cluster; requires the Nomad job to be registered.
If -runtime auto chose the wrong one, pin it explicitly.
Still stuck?
- Read the SSE event stream:
curl https://.../v1/sandboxes/<id>/eventswhile a sandbox is running. The server publishes every step. - Run
ks setup doctorto check your local setup. - Check the status page at status.polarity.so for known incidents.
- Email support@polarity.so with your sandbox ID, experiment ID, and the relevant error message. We respond within a business day.