Run

Alerts

Slack and webhook notifications when experiment metrics cross a threshold.

An alert fires when a metric crosses a threshold across one or more experiments. Notify a Slack channel, hit a webhook, or post via a Slack Bot — Keystone evaluates rules continuously as new experiments complete.

Anatomy

{
  name: "pass-rate-drop",
  eval_id: "fix-failing-test",         # optional — scope to one spec
  condition: "pass_rate < 0.8",
  window: "last_5_runs",               # optional — only the last N runs
  notify: "slack",
  slack_channel: "#agent-alerts",
}

Every alert has:

FieldMeaning
nameDisplay name
eval_idOptional spec ID to scope the alert to
condition<metric> <operator> <value>
windowOptional window (last_5_runs, last_24h, etc.)
notifyslack, webhook, or email
slack_channel / webhook_urlDestination, depending on notify

Conditions

Conditions are <metric> <operator> <value>:

Metrics:

MetricDescription
pass_rateFraction of scenarios that passed (0–1)
mean_wall_msAverage wall-clock time per scenario
p95_wall_ms95th-percentile wall time
total_cost_usdTotal experiment cost
mean_cost_per_run_usdAverage per-scenario cost
tool_success_rateTool call success rate (0–1)
side_effect_violationsCount of forbidden-rule violations
mean_tool_callsAverage tool calls per scenario

Operators: <, <=, >, >=, ==, !=

Examples:

pass_rate < 0.8
mean_wall_ms > 30000
total_cost_usd > 5.00
side_effect_violations > 0
tool_success_rate < 0.9

Slack channel notifications

Post a Block Kit message to a Slack channel via Slack Bot:

await ks.alerts.create({
  name: "cost-spike",
  condition: "mean_cost_per_run_usd > 2.00",
  notify: "slack",
  slack_channel: "#agent-alerts",
});

Setup required: the Keystone server needs SLACK_BOT_TOKEN set in its environment (typically configured by Polarity for hosted users; self-hosted folks set it themselves). The bot needs chat:write on the channel; invite it manually if posting fails.

Webhook notifications

POST a JSON body to any URL:

await ks.alerts.create({
  name: "pass-rate-drop",
  eval_id: "fix-failing-test",
  condition: "pass_rate < 0.8",
  notify: "webhook",
  webhook_url: "https://hooks.slack.com/services/T00/B00/xxx",
});

The body shape:

{
  "rule_id": "alert_abc",
  "rule_name": "pass-rate-drop",
  "reason": "pass_rate dropped to 0.65 (threshold 0.8)",
  "experiment_id": "exp-xyz",
  "experiment_name": "ci-2026-04-28",
  "fired_at": "2026-04-28T22:00:00Z",
  "metric_value": 0.65,
  "threshold": 0.8
}

Slack incoming-webhook URLs

Auto-detected and translated to Slack Block Kit messages. Slack's incoming-webhook URLs (https://hooks.slack.com/services/...) get a richer rendered card; other URLs receive the raw JSON above.

// Slack incoming-webhook URL → rich Block Kit message
await ks.alerts.create({
  name: "...",
  condition: "...",
  notify: "webhook",
  webhook_url: "https://hooks.slack.com/services/T00/B00/xxx",
});
 
// Any other URL → raw JSON POST
await ks.alerts.create({
  name: "...",
  condition: "...",
  notify: "webhook",
  webhook_url: "https://my-monitoring-system.example.com/keystone-events",
});

Listing and deleting

// List
const all = await ks.alerts.list();
// AlertRule[]
 
// Delete
await ks.alerts.delete("alert_abc");

GET /v1/alerts and DELETE /v1/alerts/<id>.

Evaluation

Alerts are evaluated server-side after every experiment completes:

  1. Find every alert rule whose eval_id matches (or is null — global rules).
  2. For each, fetch the relevant experiment's metrics.
  3. If condition is true, fire the notification.
  4. Webhook calls retry up to 3 times with exponential backoff on 5xx / network errors.

There's no rate limiting per rule — if 100 experiments fail in an hour, you get 100 notifications. Use window: to debounce:

{
  name: "pass-rate-drop",
  condition: "pass_rate < 0.8",
  window: "last_5_runs",     # only fire if the LAST 5 runs all crossed the threshold
  notify: "slack",
  slack_channel: "#agent-alerts",
}

Patterns

CI-noise dampening

Don't alert on every flaky run — only on sustained drops:

{
  name: "ci-pass-rate-degradation",
  eval_id: "ci-regression",
  condition: "pass_rate < 0.85",
  window: "last_5_runs",
  notify: "slack",
  slack_channel: "#eng",
}

Cost alarm

{
  name: "cost-alarm",
  condition: "mean_cost_per_run_usd > 1.00",
  notify: "slack",
  slack_channel: "#oncall",
}

A run that suddenly costs 5× normal is usually a bug — an infinite loop, a spec change, a tool that's calling itself. Alarm fast.

Latency regression

{
  name: "latency-regression",
  condition: "p95_wall_ms > 60000",
  notify: "webhook",
  webhook_url: "https://datadog.com/webhook/...",
}

Pipe to Datadog/PagerDuty/whatever your team uses for ops.

Forbidden-rule tripwire

{
  name: "secrets-leak-tripwire",
  condition: "side_effect_violations > 0",
  notify: "slack",
  slack_channel: "#security",
}

Forbidden-rule violations are always worth a human's attention — fire on the first one.

Limits

LimitDefault
Alerts per tenant50
Webhook retries3
Webhook timeout10s
Notification dedup window5 minutes (suppresses repeat firings of the same rule for the same experiment)