Tracing & Observability

LLM Tracing

Wrap your LLM client with ks.wrap() and every call auto-reports prompt, tokens, cost, and tool calls to Keystone.

ks.wrap() is the one-line path to LLM observability. Pass your Anthropic, OpenAI, or any OpenAI-compatible client through it, and every .create() call automatically reports a llm_call event plus one tool_use event per tool the model invoked — complete with token counts, cost, latency, and the tool arguments.

The wrap is non-invasive: response shapes are unchanged, streams iterate verbatim, errors surface to your code. Trace posting is fire-and-forget — failures are swallowed so tracing never breaks the agent.

The two trace destinations

wrap() picks where to send events automatically:

  • Sandbox modeKEYSTONE_SANDBOX_ID is set in the env (Keystone injects it when your agent runs inside a sandbox). Events go to POST /v1/sandboxes/:id/trace and nest under the sandbox run in experiment views.
  • Agent mode — no sandbox id, but the client has an apiKey (from KEYSTONE_API_KEY or explicit constructor arg). Events go to POST /v1/traces and are scoped to the API key server-side. This is the production path — any agent in prod with a ks_live_ key gets full LLM + tool traces tied to the billing owner.
  • Neitherwrap() returns the client untouched. Local dev / CI without setup.

Same code, both paths. The SDK introspects the env at call time.

Wrapping a client

import { Keystone } from "@polarityinc/polarity-keystone";
import Anthropic from "@anthropic-ai/sdk";
 
const ks = new Keystone();
ks.initTracing();                                  // no-op without sandbox or API key
 
const anthropic = ks.wrap(new Anthropic());
// Every anthropic.messages.create() now auto-reports.
const resp = await anthropic.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "..." }],
});
// resp is unchanged — same Anthropic types, same fields.

wrap() does three things

  1. Patches .create() — the LLM call method now intercepts request + response, computes cost, and posts events.
  2. Initializes traced() — tracing context is set up so any subsequent traced() calls in the same process emit spans. Pass tracing: false to skip this.
  3. Auto-instruments imported frameworks — when aiSdk (Vercel AI SDK) or langchainCallbackManager is passed, those frameworks are also wrapped.
import * as aiSdk from "ai";
ks.wrap(new OpenAI(), { aiSdk });   // wraps OpenAI client AND ai-sdk's generateText/streamText

What gets reported

For every .create() call:

{
  "ts": "2026-04-28T22:00:00Z",
  "event_type": "llm_call",
  "tool": "anthropic.create",
  "phase": "complete",
  "duration_ms": 2340,
  "status": "ok",
  "span_id": "span_xyz",
  "input": "<truncated request body>",
  "output": "<truncated response text + tool calls>",
  "cost": {
    "input_tokens": 4200,
    "output_tokens": 1800,
    "cache_read_tokens": 1500,
    "reasoning_tokens": 0,
    "model": "claude-sonnet-4-5",
    "estimated_usd": 0.043
  },
  "metadata": {
    "gen_ai.system": "anthropic",
    "gen_ai.request.model": "claude-sonnet-4-5",
    "gen_ai.usage.input_tokens": 4200,
    "gen_ai.usage.output_tokens": 1800,
    "gen_ai.operation.name": "chat"
  }
}

Plus, for every tool the model invoked:

{
  "ts": "2026-04-28T22:00:00Z",
  "event_type": "tool_use",
  "tool": "write_file",
  "phase": "invoked",
  "span_id": "span_yyy",
  "parent_span_id": "span_xyz",       // links to the llm_call event
  "input": "{\"path\": \"src/main.ts\", \"content\": \"...\"}"
}

Inputs and outputs are truncated to ~4KB to bound the payload size. Cost is computed locally from the model name and token counts using the bundled pricing table — no separate API call.

The metadata block uses OpenTelemetry GenAI semantic conventions so traces exported via OTLP round-trip cleanly into OTel backends (Honeycomb, Tempo, Jaeger).

Streams

Both Anthropic and OpenAI streams are wrapped. The wrapper accumulates chunks during iteration and emits one batch on completion:

const stream = await anthropic.messages.create({
  model: "claude-sonnet-4-5",
  stream: true,
  // ...
});
 
for await (const chunk of stream) {
  // your stream consumption — unchanged
  console.log(chunk);
}
// Trace events fire here, after the stream completes.

Stream wrappers are async-iterable and proxy other methods (Anthropic streams have helper functions like final_message); those work too.

Supported providers

Anthropic and OpenAI directly. Any OpenAI-compatible provider works by passing a custom baseURL:

ProviderPattern
Anthropicks.wrap(new Anthropic())
OpenAIks.wrap(new OpenAI())
Groqks.wrap(new OpenAI({ baseURL: "https://api.groq.com/openai/v1", apiKey: "..." }))
xAIks.wrap(new OpenAI({ baseURL: "https://api.x.ai/v1", apiKey: "..." }))
Togetherks.wrap(new OpenAI({ baseURL: "https://api.together.xyz/v1", apiKey: "..." }))
OpenRouterks.wrap(new OpenAI({ baseURL: "https://openrouter.ai/api/v1", apiKey: "..." }))
Fireworksks.wrap(new OpenAI({ baseURL: "https://api.fireworks.ai/inference/v1", apiKey: "..." }))

The wrapper detects the client by shape:

  • Has .messages.create() → Anthropic
  • Has .chat.completions.create() → OpenAI-compatible
  • Neither → returned untouched (no-op)

Per-call pricing

The bundled pricing table covers Anthropic, OpenAI, Google, Mistral, Cohere, Llama, Qwen, DeepSeek, xAI, AWS Bedrock, and Together (~50 models). Look up cost for any model:

import { estimateCost } from "@polarityinc/polarity-keystone";
 
const cost = estimateCost("claude-sonnet-4-5", 4200, 1800, 1500);
// 0.0405 (USD)

For unknown models, returns 0. Add custom pricing via pricingTable.set(...) (TS) / pricing_table.set(...) (Python).

Wrap-only mode (skip global tracing)

ks.wrap(new Anthropic(), { tracing: false });

Wraps the client without touching the global traced() context. Useful when you have multiple Keystone clients in the same process (different sandboxes, different API keys) and don't want them stomping each other's tracing state.

Forced sandbox ID

ks.wrap(new Anthropic(), { sandboxId: "sb-explicit-id" });

Override the env-based sandbox detection. Useful when you're managing multiple sandboxes from one process and want each LLM client tied to a specific sandbox.

What it doesn't do

  • Doesn't change response shapes. Wrapped responses match the provider's types exactly.
  • Doesn't change error behavior. Provider errors propagate.
  • Doesn't slow down your call path. Trace posting is fire-and-forget on a background goroutine/promise.
  • Doesn't capture full prompts by default. Inputs/outputs are truncated to 4KB. For full payloads, use experiments.export() post-run.

Patterns

One-line full observability

const ks = new Keystone();
const anthropic = ks.wrap(new Anthropic());
// Every call traced. Done.

Wrap multiple providers in one process

const anthropic = ks.wrap(new Anthropic());
const openai = ks.wrap(new OpenAI());
const groq = ks.wrap(new OpenAI({ baseURL: "https://api.groq.com/openai/v1" }));

Each is independently wrapped; events from each go to the same destination.

Use observe() for everything at once

import * as aiSdk from "ai";
 
ks.observe({
  clients: [new Anthropic(), new OpenAI()],
  aiSdk,
  langchainCallbackManager: lc.callbackManager,
});
// Wraps every named client, AI SDK, and LangChain in one call.
// Returns the labels of what was instrumented.

The "I just want everything traced" path. See Keystone.observe() in the SDK reference.