Get started

Quick Start

From zero to a passing eval in under 5 minutes. Three commands.

The fastest path to a working Keystone install. Three commands; the wizard does the rest.

1. Install the CLI

curl -fsSL https://ks.polarity.so/install.sh | bash

Verify:

ks --version

2. Add your API key

Get a key at app.paragon.run/app/keystone/settingsAPI KeysCreate Key. Keys start with ks_live_ and are shown once. Drop it in your project's .env:

cd ~/your-project
echo 'KEYSTONE_API_KEY=ks_live_...' >> .env

3. Run the wizard

ks setup

ks setup runs seven phases end-to-end — each idempotent, each independently runnable:

PhaseWhat it does
skillsWrites coding-agent skill files (.claude/skills/keystone/SKILL.md, .cursor/rules/keystone.mdc, etc.)
mcpRegisters ks mcp serve in your project's MCP configs so your agent can call Keystone as tools
specDrops a starter spec at keystone/example.yaml
instrumentScans your code for ~50 LLM-SDK construction sites and prints them grouped by family
installInstalls the Keystone SDK for each language detected (Go / TS / Python)
snapshotDetects agent code in your repo and explains how to package it as a snapshot
doctorVerifies API key, server reachability, auth, and ks on PATH

When it finishes, you'll have a starter spec, your coding agent wired up, the SDK installed, and a green doctor check.

4. Run your first eval

ks eval run keystone/example.yaml

Expected: a passing scenario in 10–30 seconds, with a RunResults JSON printed to stdout.

{
  "experiment_id": "exp-a1b2c3",
  "passed": 1,
  "failed": 0,
  "metrics": { "pass_rate": 1.0, "mean_wall_ms": 12000 },
  "scenarios": [{ "status": "pass", "composite_score": 1.0 }]
}

You're done. That's the whole quick start.

What's next

The setup wizard handled almost everything. The one thing it couldn't do is the actual code change — wrapping your existing LLM clients with ks.wrap() so your agent's calls are traced. That's a five-minute job, walked through in Setup Guide → Step 4.

After that:

Want toRead
Understand the mental modelConcepts
See real-world spec examplesExamples
Write your own specSpec Reference
Master the CLICLI Reference
Use the SDKSDK Reference
Debug something brokenTroubleshooting

If something's wrong

ks setup doctor

Runs the five health checks (API key, server reachable, auth works, ks on PATH, .env parse-clean). Tells you what's broken with actionable hints. Re-run any single phase by name (ks setup mcp, ks setup spec, etc.) — they're all idempotent.

Stuck? See Troubleshooting.