LLM Tracing
Wrap your LLM client with ks.wrap() and every call auto-reports prompt, tokens, cost, and tool calls to Keystone.
ks.wrap() is the one-line path to LLM observability. Pass your Anthropic, OpenAI, or any OpenAI-compatible client through it, and every .create() call automatically reports a llm_call event plus one tool_use event per tool the model invoked — complete with token counts, cost, latency, and the tool arguments.
The wrap is non-invasive: response shapes are unchanged, streams iterate verbatim, errors surface to your code. Trace posting is fire-and-forget — failures are swallowed so tracing never breaks the agent.
The two trace destinations
wrap() picks where to send events automatically:
- Sandbox mode —
KEYSTONE_SANDBOX_IDis set in the env (Keystone injects it when your agent runs inside a sandbox). Events go toPOST /v1/sandboxes/:id/traceand nest under the sandbox run in experiment views. - Agent mode — no sandbox id, but the client has an
apiKey(fromKEYSTONE_API_KEYor explicit constructor arg). Events go toPOST /v1/tracesand are scoped to the API key server-side. This is the production path — any agent in prod with aks_live_key gets full LLM + tool traces tied to the billing owner. - Neither —
wrap()returns the client untouched. Local dev / CI without setup.
Same code, both paths. The SDK introspects the env at call time.
Wrapping a client
import { Keystone } from "@polarityinc/polarity-keystone";
import Anthropic from "@anthropic-ai/sdk";
const ks = new Keystone();
ks.initTracing(); // no-op without sandbox or API key
const anthropic = ks.wrap(new Anthropic());
// Every anthropic.messages.create() now auto-reports.
const resp = await anthropic.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "..." }],
});
// resp is unchanged — same Anthropic types, same fields.wrap() does three things
- Patches
.create()— the LLM call method now intercepts request + response, computes cost, and posts events. - Initializes
traced()— tracing context is set up so any subsequenttraced()calls in the same process emit spans. Passtracing: falseto skip this. - Auto-instruments imported frameworks — when
aiSdk(Vercel AI SDK) orlangchainCallbackManageris passed, those frameworks are also wrapped.
import * as aiSdk from "ai";
ks.wrap(new OpenAI(), { aiSdk }); // wraps OpenAI client AND ai-sdk's generateText/streamTextWhat gets reported
For every .create() call:
{
"ts": "2026-04-28T22:00:00Z",
"event_type": "llm_call",
"tool": "anthropic.create",
"phase": "complete",
"duration_ms": 2340,
"status": "ok",
"span_id": "span_xyz",
"input": "<truncated request body>",
"output": "<truncated response text + tool calls>",
"cost": {
"input_tokens": 4200,
"output_tokens": 1800,
"cache_read_tokens": 1500,
"reasoning_tokens": 0,
"model": "claude-sonnet-4-5",
"estimated_usd": 0.043
},
"metadata": {
"gen_ai.system": "anthropic",
"gen_ai.request.model": "claude-sonnet-4-5",
"gen_ai.usage.input_tokens": 4200,
"gen_ai.usage.output_tokens": 1800,
"gen_ai.operation.name": "chat"
}
}Plus, for every tool the model invoked:
{
"ts": "2026-04-28T22:00:00Z",
"event_type": "tool_use",
"tool": "write_file",
"phase": "invoked",
"span_id": "span_yyy",
"parent_span_id": "span_xyz", // links to the llm_call event
"input": "{\"path\": \"src/main.ts\", \"content\": \"...\"}"
}Inputs and outputs are truncated to ~4KB to bound the payload size. Cost is computed locally from the model name and token counts using the bundled pricing table — no separate API call.
The metadata block uses OpenTelemetry GenAI semantic conventions so traces exported via OTLP round-trip cleanly into OTel backends (Honeycomb, Tempo, Jaeger).
Streams
Both Anthropic and OpenAI streams are wrapped. The wrapper accumulates chunks during iteration and emits one batch on completion:
const stream = await anthropic.messages.create({
model: "claude-sonnet-4-5",
stream: true,
// ...
});
for await (const chunk of stream) {
// your stream consumption — unchanged
console.log(chunk);
}
// Trace events fire here, after the stream completes.Stream wrappers are async-iterable and proxy other methods (Anthropic streams have helper functions like final_message); those work too.
Supported providers
Anthropic and OpenAI directly. Any OpenAI-compatible provider works by passing a custom baseURL:
| Provider | Pattern |
|---|---|
| Anthropic | ks.wrap(new Anthropic()) |
| OpenAI | ks.wrap(new OpenAI()) |
| Groq | ks.wrap(new OpenAI({ baseURL: "https://api.groq.com/openai/v1", apiKey: "..." })) |
| xAI | ks.wrap(new OpenAI({ baseURL: "https://api.x.ai/v1", apiKey: "..." })) |
| Together | ks.wrap(new OpenAI({ baseURL: "https://api.together.xyz/v1", apiKey: "..." })) |
| OpenRouter | ks.wrap(new OpenAI({ baseURL: "https://openrouter.ai/api/v1", apiKey: "..." })) |
| Fireworks | ks.wrap(new OpenAI({ baseURL: "https://api.fireworks.ai/inference/v1", apiKey: "..." })) |
The wrapper detects the client by shape:
- Has
.messages.create()→ Anthropic - Has
.chat.completions.create()→ OpenAI-compatible - Neither → returned untouched (no-op)
Per-call pricing
The bundled pricing table covers Anthropic, OpenAI, Google, Mistral, Cohere, Llama, Qwen, DeepSeek, xAI, AWS Bedrock, and Together (~50 models). Look up cost for any model:
import { estimateCost } from "@polarityinc/polarity-keystone";
const cost = estimateCost("claude-sonnet-4-5", 4200, 1800, 1500);
// 0.0405 (USD)For unknown models, returns 0. Add custom pricing via pricingTable.set(...) (TS) / pricing_table.set(...) (Python).
Wrap-only mode (skip global tracing)
ks.wrap(new Anthropic(), { tracing: false });Wraps the client without touching the global traced() context. Useful when you have multiple Keystone clients in the same process (different sandboxes, different API keys) and don't want them stomping each other's tracing state.
Forced sandbox ID
ks.wrap(new Anthropic(), { sandboxId: "sb-explicit-id" });Override the env-based sandbox detection. Useful when you're managing multiple sandboxes from one process and want each LLM client tied to a specific sandbox.
What it doesn't do
- Doesn't change response shapes. Wrapped responses match the provider's types exactly.
- Doesn't change error behavior. Provider errors propagate.
- Doesn't slow down your call path. Trace posting is fire-and-forget on a background goroutine/promise.
- Doesn't capture full prompts by default. Inputs/outputs are truncated to 4KB. For full payloads, use
experiments.export()post-run.
Patterns
One-line full observability
const ks = new Keystone();
const anthropic = ks.wrap(new Anthropic());
// Every call traced. Done.Wrap multiple providers in one process
const anthropic = ks.wrap(new Anthropic());
const openai = ks.wrap(new OpenAI());
const groq = ks.wrap(new OpenAI({ baseURL: "https://api.groq.com/openai/v1" }));Each is independently wrapped; events from each go to the same destination.
Use observe() for everything at once
import * as aiSdk from "ai";
ks.observe({
clients: [new Anthropic(), new OpenAI()],
aiSdk,
langchainCallbackManager: lc.callbackManager,
});
// Wraps every named client, AI SDK, and LangChain in one call.
// Returns the labels of what was instrumented.The "I just want everything traced" path. See Keystone.observe() in the SDK reference.