Operate

Troubleshooting

Common Keystone errors, what they mean, and how to fix them.

The most common errors users hit, what they actually mean under the hood, and the right way to fix them.

SDK / API errors

unauthorized / 401

Your API key is missing, invalid, or rotated.

Check:

  • KEYSTONE_API_KEY is set in your environment, or you passed apiKey: to the constructor.
  • The key starts with ks_live_ and is the value you copied at creation time (keys are shown once — if you misplaced it, generate a new one at app.paragon.run/app/keystone/settingsAPI Keys).
  • The key wasn't rotated. Old keys stop working immediately when you rotate.

Run ks setup doctor for a quick health check.

forbidden / 403

The API key is valid, but it doesn't own the resource you're asking about. Most common cause: trying to access another tenant's sandbox or spec.

Check:

  • The resource ID is yours (look at it in your Dashboard).
  • You're using the right API key for the right billing owner.

not found / 404

The resource ID doesn't exist on the server.

For spec_id errors during experiments.create(): you need to upload the spec first via ks.specs.create(yaml). The SDK doesn't auto-upload; the spec must already exist on the server before you reference it.

For sandbox_id: the sandbox might've already been destroyed (auto-timeout, manual destroy, or boot failure).

sandbox rejected: at capacity / 503

Your tenant has hit its concurrent-sandbox limit.

Fix:

  • Wait for current experiments to complete.
  • Set resources.concurrency_limit: in your spec to throttle parallelism.
  • Contact support to raise the tenant limit (Pro/Enterprise).

Implement retry with exponential backoff:

async function createWithRetry(opts, attempt = 1) {
  try {
    return await ks.sandboxes.create(opts);
  } catch (err) {
    if (err.statusCode === 503 && attempt < 5) {
      await sleep(1000 * 2 ** attempt + Math.random() * 1000);
      return createWithRetry(opts, attempt + 1);
    }
    throw err;
  }
}

rate limited / 429

You're hammering the API faster than the rate limit. Server returns Retry-After header.

Fix: sleep for Retry-After seconds and retry, or batch your requests.

Sandbox boot failures

Sandbox transitions to error state during creation

Read the audit log or stream /v1/sandboxes/:id/events to see which step failed.

Common causes:

CauseSymptomFix
Service wait_for timed out"service db not ready"Wrong wait command, or service genuinely failed to start. Check the image's docs for the right wait_for.
Image pull failed"image <name> not found"Registry doesn't have it (typo?), or server can't reach the registry.
Fixture failed"applying fixtures: ..."Usually a SQL syntax error in inline sql:. Run the SQL against a local Postgres first.
Missing required secret"resolving secrets: ANTHROPIC_API_KEY not set"The spec declares secrets: but the value isn't in your env or the Dashboard.
Setup command failed"applying setup: command 'npm ci' exited 1"Setup commands run before the agent — they're for environment prep. Check .env files, package-lock conflicts.

"spec not found" right after upload

Race condition: some clients upload a spec and immediately try to create an experiment against it. The upload is sync — by the time specs.create() returns, the spec is queryable. If you see this, you're probably using two different Keystone instances pointing at different base URLs. Check your config.

Invariant failures

file does not exist but the agent should have created it

The invariant runs in the sandbox workspace directory. Make sure your agent writes to a relative path, not an absolute one.

invariants:
  output_created:
    check:
      type: file_exists
      path: output.json    # workspace-relative

If your agent writes to /tmp/output.json, the invariant looking for output.json won't find it. Either rewrite the agent or change the invariant path to tmp/output.json.

command_exit always shows the exit code as 0 but the test failed

Your command's wrapper might be swallowing the real exit code. Common gotcha:

"command": "cd subdir && npm test"   # cd runs first; npm test's failure is hidden by &&'s short-circuit

Use sh -c directly:

check:
  type: command_exit
  command: "sh -c 'cd subdir && npm test'"

Or just make sure the last command in the chain is the one whose exit code you care about.

llm_as_judge fails with judge call failed

The judge model failed (rate limit, network error, malformed response). Different from a score: 0 — this is an infrastructure failure.

Fix: retry the experiment. If it persists, check whether your judge model is overloaded; switch to paragon-fast if you were on paragon-max.

http_mock_assertions fails — mock didn't see the request

Your agent is hitting the real API instead of the mock. Check:

  • network.dns_overrides: — does the real domain redirect to the mock?
  • network.egress.allow: — is the real domain accidentally in the allowlist?
  • The agent's HTTP client — it might be ignoring DNS overrides (some clients cache or use IP literals).

Stream the audit log:

ks logs traces $EXP_ID --event-type http_call

You'll see the actual host the agent hit. If it's api.stripe.com instead of stripe-mock, your DNS overrides aren't working.

Agent runtime issues

Agent times out

Default agent timeout is 5 minutes. For long tasks, raise it:

agent:
  timeout: 15m

Also raise the sandbox-level timeout to cover setup + agent + scoring:

resources:
  timeout: 20m

If the timeout is the right size and the agent still hangs, look at the trace events to see where it stalls. Usually it's an LLM call waiting on a slow provider, or a tool waiting on a service that didn't come up.

Agent runs but produces no traces

Check:

  1. Did you call ks.wrap() or ks.observe()? Without one, the LLM client emits nothing.
  2. Is KEYSTONE_API_KEY injected? Inside a sandbox, Keystone auto-injects KEYSTONE_SANDBOX_ID and a sandbox-scoped token. If your agent runs the wrap before entering the sandbox (which doesn't happen normally), the env var isn't set.
  3. Did your wrap target the right object? ks.wrap(new Anthropic()) works because the SDK detects the shape; if you wrapped a custom subclass that doesn't expose messages.create(), the wrap silently no-ops.
// Verify the wrap took effect
const trace = await ks.sandboxes.getTrace(process.env.KEYSTONE_SANDBOX_ID!);
console.log(`${trace.events.length} events captured`);

Agent reports cost but the dashboard shows $0

Cost is computed from the model name in the response — if the model isn't in the bundled pricing table, cost defaults to 0.

Verify the model is in the table:

import { pricingTable } from "@polarityinc/polarity-keystone";
console.log(pricingTable());   // returns Readonly<Record<model, {input, output}>>
from polarity_keystone import pricing_table
print(pricing_table())   # returns a read-only Mapping

The pricing table is read-only and bundled with the SDK; new models land in the next release. If you need cost tracking for a private/preview model that isn't listed, file an issue at github.com/Polarityinc/ks with the model name and per-million-token pricing.

Secrets

resolving secrets failure

The SDK couldn't resolve a declared secret. Check:

  • For source: env — is the env var actually set on the caller's machine?
  • For source: env:OTHER_NAME — is $OTHER_NAME set (not $NAME)?
  • For source: file:... — does the file exist? Is ~/ correctly expanded?
  • For source: command:... — does the command exit 0? Try running it manually first.
  • For source: dashboard — is the secret added in the Dashboard's Secrets tab?

The SDK skips silently for command: failures (you'll see no value forwarded) but the server fails hard at sandbox boot if the resulting value is missing — there's no silent fallback.

secrets_in_logs fails on every run

Your agent is printing secret values to stdout. Common causes:

  • Debug logging that prints process.env.
  • An LLM that echoes its API key back in a tool argument (yes, this happens).
  • A library that logs the request URL with the API key as a query parameter.

Search the audit log for the offending line:

ks logs traces $EXP_ID --event-type stdout | grep -i 'api_key\|secret\|token'

Fix in your agent code; don't disable the rule.

CLI issues

.env is ignored

ks only loads .env and .env.local from the current directory, not parent directories. Run from your project root.

The CLI also doesn't override existing env vars — if KEYSTONE_API_KEY is already exported in your shell, that wins.

ks update says "up to date" but you're on an old version

Auto-update is cached for 24h. Force a refresh:

ks update --force

If ks update can't write to the binary's location (read-only filesystem, sudo-installed /usr/local/bin/ks, etc.), re-run the installer:

curl -fsSL https://ks.polarity.so/install.sh | bash

ks setup mcp doesn't show up in my coding agent

Check the agent's MCP config file (.mcp.json, .cursor/mcp.json, etc.). The setup phase appends a server entry — restart the agent so it picks up the new config.

If ks setup mcp already wrote the config but the agent doesn't see it, your agent might be looking at a different path (some look in ~/.config/<agent>/mcp.json). Run ks mcp serve manually to verify the binary works, then point your agent at it.

Dashboard / billing

Dashboard shows "no traces" for an experiment that just completed

Two things to check:

  1. Are you looking at the right tab? Experiment details show invariants and per-scenario detail; the Traces tab shows agent-mode rows scoped by API key.
  2. Did the agent actually emit traces? A spec with no wrap() calls can pass invariants without producing any trace events.

Run ks logs traces $EXP_ID to see the raw events. If empty, the wrap didn't take effect.

Cost is much higher than expected

A common cause is forgetting to set temperature: 0 in your scenarios. Non-deterministic generation can cost 2-3× more than deterministic, because the agent retries on flaky outputs.

Other causes:

  • Long context windows (every cache miss is full-priced input tokens).
  • A judge model running on every replica when you have replicas: 50.
  • An infinite tool-call loop (check mean_tool_calls — anything over ~30 is suspicious).

The Cost trend in experiments.metrics() shows per-run cost over time — a sudden jump usually means a bug.

Self-hosted issues

The Keystone daemon (keystone) is configured via flags + env vars, not a config file. The full flag set is in keystone --help; the most common ones:

keystone -port 8012 \
  -host 127.0.0.1 \
  -workspace-dir /var/keystone-workspaces \
  -runtime auto                  # auto | local | docker | firecracker | nomad

Server won't start

Check that the chosen runtime is reachable. The daemon auto-detects (-runtime auto), but you can pin it: -runtime docker requires docker on $PATH; -runtime nomad requires the NOMAD_ADDR env var (or default http://127.0.0.1:4646); -runtime firecracker requires -fc-binary, -fc-kernel, -fc-rootfs paths.

API-key authentication is on by default. Set KEYSTONE_API_KEY=… (or pass -api-key …) before starting; the server refuses connections without one. Override only for trusted local dev with KEYSTONE_AUTH_REQUIRED=false.

The Supabase-backed metadata store needs SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY in the environment.

Sandboxes fail to boot

Pick the right runtime for your host:

  • Docker is the most common; install Docker Desktop or Docker Engine, then -runtime docker.
  • Podman is supported as a Docker drop-in (the daemon auto-detects when docker isn't on $PATH but podman is).
  • Firecracker is the production runtime — fastest cold start, strongest isolation. Requires the rootfs + kernel images plus the Firecracker binary; see the deploy script for the install flow.
  • Nomad dispatches to a cluster; requires the Nomad job to be registered.

If -runtime auto chose the wrong one, pin it explicitly.

Still stuck?

  • Read the SSE event stream: curl https://.../v1/sandboxes/<id>/events while a sandbox is running. The server publishes every step.
  • Run ks setup doctor to check your local setup.
  • Check the status page at status.polarity.so for known incidents.
  • Email support@polarity.so with your sandbox ID, experiment ID, and the relevant error message. We respond within a business day.