Alerts
Slack and webhook notifications when experiment metrics cross a threshold.
An alert fires when a metric crosses a threshold across one or more experiments. Notify a Slack channel, hit a webhook, or post via a Slack Bot — Keystone evaluates rules continuously as new experiments complete.
Anatomy
{
name: "pass-rate-drop",
eval_id: "fix-failing-test", # optional — scope to one spec
condition: "pass_rate < 0.8",
window: "last_5_runs", # optional — only the last N runs
notify: "slack",
slack_channel: "#agent-alerts",
}Every alert has:
| Field | Meaning |
|---|---|
name | Display name |
eval_id | Optional spec ID to scope the alert to |
condition | <metric> <operator> <value> |
window | Optional window (last_5_runs, last_24h, etc.) |
notify | slack, webhook, or email |
slack_channel / webhook_url | Destination, depending on notify |
Conditions
Conditions are <metric> <operator> <value>:
Metrics:
| Metric | Description |
|---|---|
pass_rate | Fraction of scenarios that passed (0–1) |
mean_wall_ms | Average wall-clock time per scenario |
p95_wall_ms | 95th-percentile wall time |
total_cost_usd | Total experiment cost |
mean_cost_per_run_usd | Average per-scenario cost |
tool_success_rate | Tool call success rate (0–1) |
side_effect_violations | Count of forbidden-rule violations |
mean_tool_calls | Average tool calls per scenario |
Operators: <, <=, >, >=, ==, !=
Examples:
pass_rate < 0.8
mean_wall_ms > 30000
total_cost_usd > 5.00
side_effect_violations > 0
tool_success_rate < 0.9
Slack channel notifications
Post a Block Kit message to a Slack channel via Slack Bot:
await ks.alerts.create({
name: "cost-spike",
condition: "mean_cost_per_run_usd > 2.00",
notify: "slack",
slack_channel: "#agent-alerts",
});Setup required: the Keystone server needs SLACK_BOT_TOKEN set in its environment (typically configured by Polarity for hosted users; self-hosted folks set it themselves). The bot needs chat:write on the channel; invite it manually if posting fails.
Webhook notifications
POST a JSON body to any URL:
await ks.alerts.create({
name: "pass-rate-drop",
eval_id: "fix-failing-test",
condition: "pass_rate < 0.8",
notify: "webhook",
webhook_url: "https://hooks.slack.com/services/T00/B00/xxx",
});The body shape:
{
"rule_id": "alert_abc",
"rule_name": "pass-rate-drop",
"reason": "pass_rate dropped to 0.65 (threshold 0.8)",
"experiment_id": "exp-xyz",
"experiment_name": "ci-2026-04-28",
"fired_at": "2026-04-28T22:00:00Z",
"metric_value": 0.65,
"threshold": 0.8
}Slack incoming-webhook URLs
Auto-detected and translated to Slack Block Kit messages. Slack's incoming-webhook URLs (https://hooks.slack.com/services/...) get a richer rendered card; other URLs receive the raw JSON above.
// Slack incoming-webhook URL → rich Block Kit message
await ks.alerts.create({
name: "...",
condition: "...",
notify: "webhook",
webhook_url: "https://hooks.slack.com/services/T00/B00/xxx",
});
// Any other URL → raw JSON POST
await ks.alerts.create({
name: "...",
condition: "...",
notify: "webhook",
webhook_url: "https://my-monitoring-system.example.com/keystone-events",
});Listing and deleting
// List
const all = await ks.alerts.list();
// AlertRule[]
// Delete
await ks.alerts.delete("alert_abc");GET /v1/alerts and DELETE /v1/alerts/<id>.
Evaluation
Alerts are evaluated server-side after every experiment completes:
- Find every alert rule whose
eval_idmatches (or is null — global rules). - For each, fetch the relevant experiment's metrics.
- If
conditionis true, fire the notification. - Webhook calls retry up to 3 times with exponential backoff on 5xx / network errors.
There's no rate limiting per rule — if 100 experiments fail in an hour, you get 100 notifications. Use window: to debounce:
{
name: "pass-rate-drop",
condition: "pass_rate < 0.8",
window: "last_5_runs", # only fire if the LAST 5 runs all crossed the threshold
notify: "slack",
slack_channel: "#agent-alerts",
}Patterns
CI-noise dampening
Don't alert on every flaky run — only on sustained drops:
{
name: "ci-pass-rate-degradation",
eval_id: "ci-regression",
condition: "pass_rate < 0.85",
window: "last_5_runs",
notify: "slack",
slack_channel: "#eng",
}Cost alarm
{
name: "cost-alarm",
condition: "mean_cost_per_run_usd > 1.00",
notify: "slack",
slack_channel: "#oncall",
}A run that suddenly costs 5× normal is usually a bug — an infinite loop, a spec change, a tool that's calling itself. Alarm fast.
Latency regression
{
name: "latency-regression",
condition: "p95_wall_ms > 60000",
notify: "webhook",
webhook_url: "https://datadog.com/webhook/...",
}Pipe to Datadog/PagerDuty/whatever your team uses for ops.
Forbidden-rule tripwire
{
name: "secrets-leak-tripwire",
condition: "side_effect_violations > 0",
notify: "slack",
slack_channel: "#security",
}Forbidden-rule violations are always worth a human's attention — fire on the first one.
Limits
| Limit | Default |
|---|---|
| Alerts per tenant | 50 |
| Webhook retries | 3 |
| Webhook timeout | 10s |
| Notification dedup window | 5 minutes (suppresses repeat firings of the same rule for the same experiment) |