Alerts

Slack and webhook notifications when experiment metrics cross a threshold.

An alert fires when a metric crosses a threshold across one or more experiments. Notify a Slack channel, hit a webhook, or post via a Slack Bot — Keystone evaluates rules continuously as new experiments complete.

Anatomy

{
  name: "pass-rate-drop",
  eval_id: "fix-failing-test",         # optional — scope to one spec
  condition: "pass_rate < 0.8",
  window: "last_5_runs",               # optional — only the last N runs
  notify: "slack",
  slack_channel: "#agent-alerts",
}

Every alert has:

Field	Meaning
`name`	Display name
`eval_id`	Optional spec ID to scope the alert to
`condition`	`<metric> <operator> <value>`
`window`	Optional window (`last_5_runs`, `last_24h`, etc.)
`notify`	`slack`, `webhook`, or `email`
`slack_channel` / `webhook_url`	Destination, depending on `notify`

Conditions

Conditions are <metric> <operator> <value>:

Metrics:

Metric	Description
`pass_rate`	Fraction of scenarios that passed (0–1)
`mean_wall_ms`	Average wall-clock time per scenario
`p95_wall_ms`	95th-percentile wall time
`total_cost_usd`	Total experiment cost
`mean_cost_per_run_usd`	Average per-scenario cost
`tool_success_rate`	Tool call success rate (0–1)
`side_effect_violations`	Count of forbidden-rule violations
`mean_tool_calls`	Average tool calls per scenario

Operators: <, <=, >, >=, ==, !=

Examples:

pass_rate < 0.8
mean_wall_ms > 30000
total_cost_usd > 5.00
side_effect_violations > 0
tool_success_rate < 0.9

Slack channel notifications

Post a Block Kit message to a Slack channel via Slack Bot:

await ks.alerts.create({
  name: "cost-spike",
  condition: "mean_cost_per_run_usd > 2.00",
  notify: "slack",
  slack_channel: "#agent-alerts",
});

Setup required: the Keystone server needs SLACK_BOT_TOKEN set in its environment (typically configured by Polarity for hosted users; self-hosted folks set it themselves). The bot needs chat:write on the channel; invite it manually if posting fails.

Webhook notifications

POST a JSON body to any URL:

await ks.alerts.create({
  name: "pass-rate-drop",
  eval_id: "fix-failing-test",
  condition: "pass_rate < 0.8",
  notify: "webhook",
  webhook_url: "https://hooks.slack.com/services/T00/B00/xxx",
});

The body shape:

{
  "rule_id": "alert_abc",
  "rule_name": "pass-rate-drop",
  "reason": "pass_rate dropped to 0.65 (threshold 0.8)",
  "experiment_id": "exp-xyz",
  "experiment_name": "ci-2026-04-28",
  "fired_at": "2026-04-28T22:00:00Z",
  "metric_value": 0.65,
  "threshold": 0.8
}

Slack incoming-webhook URLs

Auto-detected and translated to Slack Block Kit messages. Slack's incoming-webhook URLs (https://hooks.slack.com/services/...) get a richer rendered card; other URLs receive the raw JSON above.

// Slack incoming-webhook URL → rich Block Kit message
await ks.alerts.create({
  name: "...",
  condition: "...",
  notify: "webhook",
  webhook_url: "https://hooks.slack.com/services/T00/B00/xxx",
});
 
// Any other URL → raw JSON POST
await ks.alerts.create({
  name: "...",
  condition: "...",
  notify: "webhook",
  webhook_url: "https://my-monitoring-system.example.com/keystone-events",
});

Listing and deleting

// List
const all = await ks.alerts.list();
// AlertRule[]
 
// Delete
await ks.alerts.delete("alert_abc");

GET /v1/alerts and DELETE /v1/alerts/<id>.

Evaluation

Alerts are evaluated server-side after every experiment completes:

Find every alert rule whose eval_id matches (or is null — global rules).
For each, fetch the relevant experiment's metrics.
If condition is true, fire the notification.
Webhook calls retry up to 3 times with exponential backoff on 5xx / network errors.

There's no rate limiting per rule — if 100 experiments fail in an hour, you get 100 notifications. Use window: to debounce:

{
  name: "pass-rate-drop",
  condition: "pass_rate < 0.8",
  window: "last_5_runs",     # only fire if the LAST 5 runs all crossed the threshold
  notify: "slack",
  slack_channel: "#agent-alerts",
}

Patterns

CI-noise dampening

Don't alert on every flaky run — only on sustained drops:

{
  name: "ci-pass-rate-degradation",
  eval_id: "ci-regression",
  condition: "pass_rate < 0.85",
  window: "last_5_runs",
  notify: "slack",
  slack_channel: "#eng",
}

Cost alarm

{
  name: "cost-alarm",
  condition: "mean_cost_per_run_usd > 1.00",
  notify: "slack",
  slack_channel: "#oncall",
}

A run that suddenly costs 5× normal is usually a bug — an infinite loop, a spec change, a tool that's calling itself. Alarm fast.

Latency regression

{
  name: "latency-regression",
  condition: "p95_wall_ms > 60000",
  notify: "webhook",
  webhook_url: "https://datadog.com/webhook/...",
}

Pipe to Datadog/PagerDuty/whatever your team uses for ops.

Forbidden-rule tripwire

{
  name: "secrets-leak-tripwire",
  condition: "side_effect_violations > 0",
  notify: "slack",
  slack_channel: "#security",
}

Forbidden-rule violations are always worth a human's attention — fire on the first one.

Limits

Limit	Default
Alerts per tenant	50
Webhook retries	3
Webhook timeout	10s
Notification dedup window	5 minutes (suppresses repeat firings of the same rule for the same experiment)

Datasets LLM Tracing