LLM observability, extended to the whole agent.
Token traces tell you what the model said. SafeRun records what the agent did — every tool call, argument, return value, and policy decision — and lets you replay any failure step by step.
LLM tracing ends where the agent begins.
Classic LLM observability — LangSmith, Langfuse, Helicone, Arize Phoenix — does one thing well: it captures prompts, completions, tokens, and latency. That's enough when your product is a chat completion.
But agents are loops. They call tools, pass arguments, get results back, retry, branch, and sometimes do real damage. To debug them you need the action trace, not just the model trace — and you need to replay it.
What agent-grade observability looks like.
Full-trace capture
Every prompt, model response, tool call, argument, return value, latency, and cost — recorded at the action layer, not just the token layer.
Time-travel replay
Re-run any failed agent step with the exact context. Reproduce production bugs locally without copy-pasting JSON between Slack threads.
Live action stream
Watch agent runs as they happen. Filter by agent, tool, status, or user — find the bad run before the customer files a ticket.
Diffs across model versions
Compare runs across model upgrades. Catch the silent regression where GPT-5 picks the wrong tool 3% more often than 4o.
Inline policy decisions
Every block, allow, and approval recorded next to the action it touched. Observability and control in the same trace.
Anomaly + cost alerts
Page when block rate, retry rate, or per-run cost drifts off baseline. Stop runaway loops before they touch your AWS bill.
LLM observability vs agent observability.
| Layer | LLM tracing | SafeRun (agent layer) |
|---|---|---|
| Prompts & completions | Yes | Yes |
| Token + cost per call | Yes | Yes |
| Tool calls + arguments | Partial | Yes |
| Tool return values | Rare | Yes |
| Step-by-step replay | No | Yes |
| Policy decisions inline | No | Yes |
| Human approvals | No | Yes |
| Loop + cost breakers | No | Yes |
Wire observability in once.
import { observe } from "@saferun/sdk";
observe(agent, { service: "support-agent-v2", env: "prod" });
// Step-by-step traces, replay, and policy decisions in one timeline.Common questions about LLM observability.
Ship agents your on-call won't dread.
Add SafeRun in three lines. Validate, block, and replay every risky tool call — before it touches production.
