Setup Guide
The complete procedure for wiring Keystone into an existing agent — written for humans and for AI coding assistants. Follow this page top to bottom.
AI agents: use https://docs.paragon.run/llms-full.txt as your source of truth — page-level
WebFetchreturns summarizer excerpts that drop example code detail. If a user pasted the docs URL at you and said "set this up," execute every step below on their repo — do not paraphrase the docs back at them.
You have an existing agent and you want to evaluate it with Keystone. This page is a strict procedure — follow it in order and you'll end with a working spec that passes on the first run.
Step 0 — Install the SDK and set your API key
npm install @polarityinc/polarity-keystone
export KEYSTONE_API_KEY=ks_live_...Generate the key at app.paragon.run/app/keystone/settings → API Keys.
Step 1 — Inspect the agent code
Before you write any spec, you need three facts about the agent:
Write these three facts down before proceeding. The rest of the guide is mechanical — fill in a template with what you found.
Step 2 — Drop your provider keys into the Dashboard
For the secrets the team shares (XAI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.), go to app.paragon.run/app/keystone/settings → Secrets tab and paste the values. They're AES-256-GCM encrypted at rest and shared with your teammates on the same billing account.
Use the Dashboard for anything a teammate will also need. Use local
.env(withsource: envin the spec) for your personal dev-only keys.
Step 3 — Scaffold the spec from the template
Copy this template to specs/scenario-1.yaml and fill in the marked sections. Everything else stays as-is.
version: 1
id: "<FILL IN: short-kebab-id, e.g. email-agent-send-flow>"
description: "<FILL IN: one sentence, what you're testing>"
# Which agent snapshot to run. You upload this in Step 4.
agent:
type: snapshot
snapshot: "<FILL IN: name you picked for your agent>"
# ONE ENTRY PER env var from Step 1, Fact 2.
# For team-shared keys → source: dashboard
# For personal keys in your local .env → source: env
secrets:
- name: XAI_API_KEY
source: dashboard
- name: DB_PASSWORD
source: env # from $DB_PASSWORD on your machine
# ... repeat for every env var the agent needs
# ONE ENTRY PER service from Step 1, Fact 3.
services:
- name: db
image: postgres:16
env:
POSTGRES_PASSWORD: "{{ secrets.DB_PASSWORD }}"
POSTGRES_DB: "<FILL IN: database name>"
ports: [5432]
wait_for: "pg_isready -h localhost"
# If your agent reads seeded rows, drop seed.sql at repo root.
fixtures:
- type: sql
service: db
path: seed.sql
# Restrict outbound. Keystone + the LLM proxy are always allowed implicitly.
# Add each third-party domain the agent *actually* calls.
network:
egress:
default: deny
allow:
- api.x.ai # or api.openai.com, api.anthropic.com, etc.
task:
prompt: "<FILL IN: what you want the agent to do, verbatim prompt>"
# AT LEAST ONE invariant with gate: true. Choose from:
# command_exit → run a shell command in /workspace, expect exit 0
# file_exists → a specific output file exists
# sql → a SQL query against a service returns expected value
# http_mock_assertions → the agent hit a mocked endpoint N times
# llm_as_judge → semantic check (billable; use sparingly)
invariants:
did_the_thing:
description: "<FILL IN: what must be true after the agent runs>"
weight: 1.0
gate: true
check:
type: command_exit
command: "<FILL IN: shell command that succeeds iff the agent did the thing>"
scoring:
pass_threshold: 0.85Do not use
path: stdout.logas an invariant check path. Stdout is not persisted as a file in the sandbox. Usecommand_exitagainst actual workspace files the agent writes, orllm_as_judgewithinput_from: workspace.
Do not put real API keys behind
from: "static://...". Spec files get committed. Usesource: dashboard(encrypted) orsource: env(local.envonly).
Step 4 — Upload the agent and the spec
import { Keystone } from "@polarityinc/polarity-keystone";
import "dotenv/config";
import { readFileSync } from "node:fs";
const ks = new Keystone();
// One upload per agent build. Tars the current dir (respects .gitignore).
const snap = await ks.agents.upload({
name: "<FILL IN: agent name, same as spec.agent.snapshot>",
bundle: "./",
runtime: "node20",
entrypoint: ["tsx", "src/index.ts"],
});
console.log("snapshot:", snap.id);
// One upload per scenario.
const spec = await ks.specs.create({
spec_yaml: readFileSync("specs/scenario-1.yaml", "utf8"),
});
console.log("spec:", spec.spec_id);
// Create the experiment. specPath lets the SDK auto-forward
// any `source: env` / `source: file:` / `source: command:` secrets.
const exp = await ks.experiments.create({
name: "scenario-1",
spec_id: spec.spec_id,
specPath: "specs/scenario-1.yaml",
});
console.log("experiment:", exp.id, "(status: draft)");After this step, the experiment appears in the Dashboard as Draft — nothing is running yet.
Step 5 — Review in the Dashboard, then run
Open app.paragon.run/app/keystone/experiments. Click into your new experiment. You'll see:
- The spec YAML that was uploaded
- The secret names it declared (with warnings if any aren't set yet)
- The services it will boot
- The invariants that will score it
When it looks right, hit Run. The experiment moves from draft → running → completed (or failed). Watch results stream in.
Programmatic equivalent:
const results = await ks.experiments.runAndWait(exp.id);
console.log("pass rate:", results.metrics.pass_rate);Common first-run failures + fixes
| What you see | Cause | Fix |
|---|---|---|
secret "X" declared in spec but not set in Dashboard or forwarded from .env | The name in secrets: has no value at any layer | Either add it to the Dashboard or export it in your shell / .env |
Sandbox boot hangs on container_started, agent never runs | No egress to your package registry for npm install / pip install | Either include node_modules / venv in the snapshot tarball OR add registry.npmjs.org / pypi.org to network.egress.allow |
| Every invariant fails with "file not found" | Invariant paths assume files the agent didn't actually write | Run the sandbox once manually, ls /workspace, see what's there, rewrite the invariant paths |
Agent connects to localhost:5432 and errors "connection refused" | Code uses localhost, but services are reachable by service name (db, redis, etc.) | Change connection strings to postgres://db:5432/..., or read KEYSTONE_SERVICE_DB_HOST env var |
llm_as_judge fails with "no card on file" | Judges are billable and need a payment method | Add a card at app.paragon.run/app/keystone/data/billing |
Experiment stuck at draft forever | You uploaded it but never called run | Click Run in the Dashboard, or call ks.experiments.run(exp.id) / runAndWait |
Related reading
- Spec schema — full field-by-field reference
- Examples — the canonical end-to-end example with all primitives
- SDK methods — every method across TS/Python/Go
- Billing — how compute + LLM judge costs are billed