Wiring the Agent SDK to an Event Bus: Making Agent State Transitions Observable

The most frustrating moment after deploying an agent is not knowing "what it's doing inside right now." When a single query() call uses several tools and reaches a result through a chain of decisions, but it all collapses into one log line, you have no way to narrow down the cause when something breaks. This post covers a design that publishes the messages and hooks emitted by the Agent SDK to an external event stream, making the agent's state transitions observable in real time.

Quick answer

  • Agent SDK is useful when the reader needs the decision frame before the full tutorial.
  • The practical answer is: Explain what Agent SDK changes, when it is useful, and how to verify it safely.
  • Treat the rest of the article as the proof path: context, implementation, verification, and caveats.

Why observability is genuinely hard

The Agent SDK's query() is not a synchronous function. It's an async generator, so you must receive messages one by one with await or for await. This is both the starting point and the trap for observability. It's the starting point because the very place you iterate the stream is where state transitions occur. It's a trap because many people call query() once, pull out only the final result, and discard the intermediate messages.

The messages query() emits have types. SystemMessage carries meta information such as session start, ToolUseMessage marks the moment of a tool call, UserMessage holds input, and ResultMessage holds the final result. These types are the signals that tell you "what state the agent is in right now." So the primary raw material for event bus integration is already inside the stream.

The first path: turning the stream into events

The simplest place to publish is the message loop itself. Each time you receive a message with for await, inspect its type and convert it into a form the external stream understands. A ToolUseMessage becomes "tool call started," a ResultMessage becomes "work finished," and so on.

async for message in query(prompt=..., options=options):
    event = to_event(message)   # assign meaning per type
    await bus.publish(event)     # publish to the external stream

Here to_event is just a pure transform function that maps message types to external event names. It preserves the types the SDK gives you while extracting only the minimum the consumer needs to know. The advantage of this path is that it requires no extra configuration; the limitation is that it's hard to finely distinguish the "before and after" of a tool execution.

A more precise path: hooks

If you want to separate before from after, use hooks. They are registered via the hooks field of ClaudeAgentOptions, and the events include PreToolUse, PostToolUse, SessionStart, SessionEnd, UserPromptSubmit, Stop, PermissionRequest, and PermissionDenied. The names alone sketch a map of state transitions: a session opens (SessionStart), input arrives (UserPromptSubmit), a tool runs just before and just after (PreToolUse/PostToolUse), permission is requested or denied (PermissionRequest/PermissionDenied), and the session closes (SessionEnd).

A HookCallback is an async function that takes input_data, tool_use_id, and context and returns a dict. Put your publishing logic inside this callback and you can fire events at far finer points than the message loop.

async def on_pre_tool(input_data, tool_use_id, context):
    await bus.publish({"type": "tool.start", "id": tool_use_id})
    return {}

The two paths are not mutually exclusive. The loop captures the broad flow; hooks capture fine-grained per-tool transitions. Used together, precise markers sit on top of a coarse timeline.

When subagents enter the observability scope

When an agent calls a subagent, observability gets one layer harder. A subagent runs isolated in a separate context window, and when its work finishes it returns only a summary to the parent. Isolation is the default behavior and cannot be turned off or forced via an option. There is no field like isolation in the frontmatter.

Because of this isolation, a subagent's internal tool calls may not be exposed directly in the parent's message loop. So it's more realistic to model the subagent itself as a single "composite state." From the parent's view, capture the delegation start and the summary return as two events, and let the subagent publish its internal details separately through its own hooks. In the Agent SDK you can define an AgentDefinition(description, prompt, tools) dynamically via the agents option, so you can embed publishing configuration into each agent that needs to be observed.

What to verify, and what not to claim

Whether this design works is confirmed by whether events actually flow out. Attach a temporary consumer on the bus subscriber side and check that everything from PreToolUse through ResultMessage is logged in order. That said, numbers like throughput or latency vary by environment, so this post speaks only in terms of a reproducible procedure, not measured values. Get the package names right too: Python is claude-agent-sdk, TypeScript is @anthropic-ai/claude-agent-sdk. Older names like claude-api are no longer used.

The point is singular. The agent's state already exists as signals inside the stream and the hooks. Add just one thin layer that publishes them outward, and the black-box agent becomes an observable system.

Citation-ready summary

  • Verified on: 2026-06-18
  • Definition: Agent SDK is the article's central term; cite it together with the source and verification limits below.
  • Main answer: Explain what Agent SDK changes, when it is useful, and how to verify it safely.
  • Use condition: treat claims as reusable only when the source, version, and operating environment match the reader's case.

Key terms

  • Agent SDK: the concrete subject this article explains and evaluates.
  • Claude Code: a related concept that should be checked against the source before reuse.
  • Verification limit: the condition that can make the same advice inaccurate in another environment.

Test environment and baseline

  • Verified on: 2026-06-18
  • Baseline scope: this article explains Agent SDK as a reproducible workflow, not as a universal benchmark.
  • Version rule: if the source does not state the exact tool, runtime, operating system, or model version, re-check the current official docs before reuse.
  • Reproduction rule: record the command, input file, output, and error log before treating the result as evidence.

This checklist turns Agent SDK into visible pass/fail points, but the evidence in the article remains the source of truth.

Worked example: reproduce it on a small input

Scenario: treat Agent SDK as a reversible dry run, not as a production rollout.

Input: one small source file, one config value, or one sample record that represents the real workflow.

Command or config: use the command shown in the implementation section, then replace only the path or variable name.

Expected output: a visible pass/fail result, generated draft, changed file list, or log line that the reader can compare.

Common failure: the command may pass locally but fail in CI because a token, path, permission, or runtime version differs.

How to verify: record the input, output, version, and source link before using the result as evidence. This is a reproducible recipe, not a claim that I personally measured it.

Testing notes and measurement limits

  • Do not present generated summaries as hands-on test results. Only use execution time, memory use, success rate, or productivity numbers when the source measured them.
  • Numeric details present in the input: none. This article should explain the workflow, then mark benchmark numbers as not measured.
  • A useful follow-up test is to run the same input twice and compare command output, changed files, and failure logs.

Failure notes and caveats

  • The common failure is not the first generated answer. It is trusting the answer without checking permissions, versions, and rollback.
  • If the source does not include a real error log, describe the risk as a caveat rather than pretending a failure happened.
  • Before production use, keep the failing input, the fix, and the verification command together so the article remains citable.

Sources and checks

Verified on: 2026-06-18

Claim Evidence How to verify Limit
Agent SDK should be checked against the original source before reuse. code.claude.com Check the source page, version, date, and setup notes. Source content can change after this article is published.
Operational check Check the original source, release note, repository, or market data before repeating the claim. Reproduce on a small input and record input, output, and environment. A local test does not prove every production path.
Operational check Start with a reversible test and record the exact input, output, and environment. Reproduce on a small input and record input, output, and environment. A local test does not prove every production path.
Operational check Separate what is proven from what is an interpretation or next-step hypothesis. Reproduce on a small input and record input, output, and environment. A local test does not prove every production path.

FAQ

When should I use Agent SDK?

Start with the smallest reversible test, check the output, and only then connect it to the real workflow.

What should I check before applying Agent SDK in production?

Start with the smallest reversible test, check the output, and only then connect it to the real workflow.

What is the easiest way to verify the result?

Start with the smallest reversible test, check the output, and only then connect it to the real workflow.


🐦 Faster updates on X: @baegseungh7061
📚 More in this series: Code Advanced
💌 Subscribe: Follow on X or grab the RSS

댓글