SDK Reference
All Keystone SDK methods across TypeScript, Python, and Go. AI agents: fetch https://docs.paragon.run/llms-full.txt for raw unsummarized content.
AI agents: use https://docs.paragon.run/llms-full.txt as your source of truth — page-level
WebFetchreturns summarizer excerpts that drop wrap/traced details.
The Keystone client has seven services: sandboxes, specs, experiments, alerts, agents, datasets, and scoring. All examples use TypeScript -- Python and Go follow the same patterns.
Client setup
Get your API key from app.paragon.run/app/keystone/settings → API Keys tab → Create Key. Keys start with ks_live_ and are shown once at creation. Either pass it to the client directly or set KEYSTONE_API_KEY in your environment.
import { Keystone } from '@polarityinc/polarity-keystone';
const ks = new Keystone({
apiKey: 'ks_live_...', // or set KEYSTONE_API_KEY env var
baseUrl: 'https://keystone.example.com', // default: https://keystone.polarity.so
timeout: 30000, // request timeout in ms
});Secrets
Secrets are declared in your spec with a source: field that tells the SDK where to pull each value from — your local .env, a file on disk, a shell command (Vault / 1Password / Doppler), or the Dashboard. See the full list of source types in Specs → secrets.
Auto-forwarding from a spec file
Pass specPath (TS) / spec_path (Python) and the SDK reads the spec's secrets: block, resolves each declared source, and forwards the resulting {name: value} map in the create request.
const exp = await ks.experiments.create({
name: "scenario-1",
spec_id: "email-agent-01",
specPath: "./specs/scenario-1.yaml", // SDK resolves sources + forwards
});
await ks.experiments.runAndWait(exp.id);You can also call the resolver directly if you want to inspect or modify the map before sending:
import { collectDeclaredSecretsFromFile } from "@polarityinc/polarity-keystone";
const secrets = collectDeclaredSecretsFromFile("./specs/scenario-1.yaml");
// secrets → { XAI_API_KEY: "xai-...", DB_PASSWORD: "..." }
const exp = await ks.experiments.create({
name: "scenario-1",
spec_id: "email-agent-01",
secrets,
});Precedence
Highest wins:
- Spec literal (
from: static://...) — deterministic fixtures - SDK-forwarded source value (
env,env:X,file:,command:) - Dashboard Secret — server-side fallback for missing or
source: dashboardentries
A declared secret that resolves to nothing at any layer fails the sandbox boot loudly — no silent empties.
Dashboard as the team/prod baseline
The Dashboard Secrets tab stores AES-256-GCM-encrypted values scoped to the billing owner. Use it when:
- A secret must be shared across teammates without everyone maintaining their own
.env - Running in CI/prod where no
.envexists on the machine - A prod-critical key must refuse any local override (declare with
source: dashboard)
Sandboxes
Sandboxes are isolated environments where your agent runs. Create one from a spec, interact with it, then destroy it.
sandboxes.create(opts)
const sb = await ks.sandboxes.create({
spec_id: 'fix-failing-test', // required: which spec to use
timeout: '10m', // optional: auto-cleanup timer
metadata: { run: 'test-1' }, // optional: key-value pairs for tracking
});
// Returns: { id, spec_id, state, path, url, created_at, metadata, services }The services field contains connection info for any backing services defined in the spec:
sb.services.db // { host: "db", port: 5432, ready: true }
sb.services.cache // { host: "cache", port: 6379, ready: true }sandboxes.get(id) / sandboxes.list() / sandboxes.destroy(id)
const sb = await ks.sandboxes.get('sb-abc123');
// sb.state: 'creating' | 'ready' | 'running' | 'stopped' | 'error'
const all = await ks.sandboxes.list();
await ks.sandboxes.destroy('sb-abc123');sandboxes.runCommand(id, opts)
Run a shell command inside the sandbox.
const result = await ks.sandboxes.runCommand('sb-abc123', {
command: 'npm test',
timeout: '2m',
});
// Returns: { command, stdout, stderr, exit_code, duration_ms }File operations
// Read
const content = await ks.sandboxes.readFile('sb-abc123', 'src/utils.ts');
// Write
await ks.sandboxes.writeFile('sb-abc123', 'src/utils.ts', 'const x = 1;');
// Delete
await ks.sandboxes.deleteFile('sb-abc123', 'tmp/debug.log');State and diffing
// Full filesystem snapshot (files + checksums)
const snapshot = await ks.sandboxes.state('sb-abc123');
// Returns: { captured_at, files: { [path]: { size, mode, checksum } } }
// What changed since sandbox creation
const diff = await ks.sandboxes.diff('sb-abc123');
// Returns: { added: string[], removed: string[], modified: string[] }Trace ingestion
Post trace events to a sandbox. The wrap() helper does this automatically for LLM calls, but you can also call it directly.
await ks.sandboxes.ingestTrace('sb-abc123', [
{ event_type: 'tool_call', tool: 'write_file', phase: 'end', status: 'ok', duration_ms: 120 },
]);
const trace = await ks.sandboxes.getTrace('sb-abc123');
// Returns: { events: TraceEvent[], metrics: TraceMetrics }Real-time events (SSE)
Stream sandbox lifecycle events in real-time using Server-Sent Events:
GET /v1/sandboxes/:id/events
Events include status changes (creating, ready, running, destroyed), service startup, fixture application, and command execution. Useful for building dashboards or progress indicators.
const eventSource = new EventSource(
`${baseUrl}/v1/sandboxes/sb-abc123/events`
);
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log(`[${data.event_type}]`, data.data);
};Specs
Upload and manage spec YAML files.
// Upload a spec
const spec = await ks.specs.create(readFileSync('my-spec.yaml', 'utf-8'));
// Get, list, delete
const spec = await ks.specs.get('fix-failing-test');
const specs = await ks.specs.list();
await ks.specs.delete('fix-failing-test');Specs are versioned automatically. Each upload to the same id creates a new version.
Experiments
Run your spec across scenarios and score the results.
experiments.create(opts) / experiments.run(id)
const exp = await ks.experiments.create({
name: 'baseline-v1',
spec_id: 'fix-failing-test',
});
// Trigger async (returns immediately)
await ks.experiments.run(exp.id);experiments.runAndWait(id, opts?)
Trigger and poll until complete. This is the most common way to run experiments.
const results = await ks.experiments.runAndWait(exp.id, {
pollInterval: 2000, // ms between polls (default: 2000)
timeout: 300000, // max ms to wait (default: 300000)
});Results structure
| Field | Type | Description |
|---|---|---|
total_scenarios | number | Total scenarios run |
passed / failed / errors | number | Counts |
metrics.pass_rate | number | 0.0 to 1.0 |
metrics.mean_wall_ms | number | Average latency |
metrics.p95_wall_ms | number | 95th percentile latency |
metrics.total_cost_usd | number | Total cost |
metrics.mean_cost_per_run_usd | number | Cost per scenario |
metrics.tool_success_rate | number | 0.0 to 1.0 |
scenarios | array | Per-scenario results with invariants and reproducers |
experiments.compare(baselineId, candidateId)
Compare two experiments. Detects regressions in pass rate, cost, and latency.
const comparison = await ks.experiments.compare('exp-baseline', 'exp-new');
// Returns: {
// regressed: boolean,
// regressions: ["pass_rate dropped from 90% to 60%"],
// metrics: [{ name, baseline, candidate, delta, direction }]
// }experiments.metrics(id)
Detailed metrics with tool breakdown and trends over time.
const metrics = await ks.experiments.metrics(exp.id);
// Returns: { summary, tool_breakdown, cost_trend, pass_rate_trend }Alerts
Alert rules notify you when experiment metrics cross a threshold. Alerts are persisted and survive server restarts.
Conditions use the format <metric> <operator> <value>.
Metrics: pass_rate, mean_wall_ms, p95_wall_ms, total_cost_usd, mean_cost_per_run_usd, tool_success_rate, side_effect_violations, mean_tool_calls
Operators: <, <=, >, >=, ==, !=
Webhook alerts
await ks.alerts.create({
name: 'pass-rate-drop',
eval_id: 'fix-failing-test', // optional: only fire for this spec
condition: 'pass_rate < 0.8',
notify: 'webhook',
webhook_url: 'https://hooks.slack.com/services/T00/B00/xxx',
});Slack webhook URLs are auto-detected and receive rich Block Kit messages. Other URLs receive the raw JSON payload.
Slack Bot alerts
Post directly to a Slack channel using a bot token (SLACK_BOT_TOKEN env var on the server):
await ks.alerts.create({
name: 'cost-spike',
condition: 'mean_cost_per_run_usd > 2.00',
notify: 'slack',
slack_channel: '#agent-alerts',
});alerts.list() / alerts.delete(id)
const alerts = await ks.alerts.list();
await ks.alerts.delete('alert-abc123');Agents
Agent snapshots are immutable, versioned bundles of your agent code. Upload them and reference them in specs with agent.type: snapshot.
agents.upload(opts)
const snapshot = await ks.agents.upload({
name: 'my-agent',
entrypoint: ['python', 'main.py'],
runtime: 'python3.12',
tag: 'latest',
bundle: tarballBytes, // Uint8Array of the .tar.gz
});
// Returns: { id, name, version, tag, digest, size_bytes, entrypoint, created_at }agents.get(name, opts?)
const latest = await ks.agents.get('my-agent');
const tagged = await ks.agents.get('my-agent', { tag: 'stable' });
const specific = await ks.agents.get('my-agent', { version: 3 });agents.list(opts?) / agents.listVersions(name, opts?)
const page = await ks.agents.list({ limit: 50 });
// Returns: { items: AgentSnapshot[], next_cursor?: string }
const versions = await ks.agents.listVersions('my-agent');agents.delete(snapshot)
Pass the full snapshot object, not just the ID.
const snapshot = await ks.agents.get('my-agent', { version: 1 });
await ks.agents.delete(snapshot);Agent traces
Every trace is tagged with the agent that produced it. Query by agent name and version:
GET /v1/agents/my-agent/traces
GET /v1/agents/my-agent/traces?version=3
GET /v1/agents/my-agent/traces?limit=100
Returns traces plus computed metrics (tool success rate, latency percentiles, per-tool breakdown).
LLM tracing
ks.wrap(client)
Wrap an Anthropic or OpenAI client so every call automatically reports traces to the current sandbox. Sandbox routing is automatic — the SDK reads KEYSTONE_SANDBOX_ID from the environment (Keystone injects it when your agent runs inside a sandbox). Outside a sandbox there's nothing to route to, so wrap() returns the client untouched and your code runs as normal.
const anthropic = ks.wrap(new Anthropic());
const openai = ks.wrap(new OpenAI());
// .create() calls now auto-report LLM usage, tool calls, and latency
// when running inside a sandbox. Locally, they pass through unchanged.Works with any OpenAI-compatible provider:
| Provider | How to wrap |
|---|---|
| Anthropic | ks.wrap(new Anthropic()) |
| OpenAI | ks.wrap(new OpenAI()) |
| Groq | ks.wrap(new OpenAI({ baseURL: 'https://api.groq.com/openai/v1' }), { sandboxId }) |
| xAI | ks.wrap(new OpenAI({ baseURL: 'https://api.x.ai/v1' }), { sandboxId }) |
| Together | ks.wrap(new OpenAI({ baseURL: 'https://api.together.xyz/v1' }), { sandboxId }) |
ks.initTracing(sandboxId) and traced(name, fn)
For non-LLM operations, use traced() to capture custom spans.
import { Keystone, traced } from '@polarityinc/polarity-keystone';
const ks = new Keystone();
ks.initTracing('sb-xxx');
const result = await traced('write_config', async () => {
await fs.writeFile('config.json', JSON.stringify(config));
return 'ok';
});Nested traced() calls create parent-child spans automatically.
Keystone.fromSandbox()
If your agent is running inside a Keystone sandbox, use this to get a pre-configured client. It reads KEYSTONE_BASE_URL and KEYSTONE_SANDBOX_ID from the environment that Keystone injects automatically.
const { client, sandbox } = await Keystone.fromSandbox();
// client: ready-to-use Keystone instance
// sandbox.services.db: { host: "db", port: 5432, ready: true }Your agent also gets environment variables for each service:
KEYSTONE_SANDBOX_ID-- the sandbox IDKEYSTONE_BASE_URL-- the Keystone API URLKEYSTONE_SERVICE_DB_HOST/KEYSTONE_SERVICE_DB_PORT-- per-service connection info