Streaming Structured Output — Incremental JSON Rendering Without a Parser

04-structured-output solved the reliability problem: use forced tool use to guarantee valid JSON back from the model. But it waited for the complete response before doing anything with it. For a backend pipeline or eval scorer, that's fine — a machine is consuming the output and doesn't care about latency. For a user-facing UI, it means a blank screen until the entire object arrives.

The question this module explores: can you render structured fields progressively as the model generates them, the same way streaming text renders word by word?

The short answer is yes, with a caveat worth understanding before you reach for it.

The Core Problem: Partial JSON Is Invalid JSON

When the model streams a structured output via input_json_delta, you receive the JSON object in chunks:

{"title": "Forw
{"title": "Forward Deployed Eng
{"title": "Forward Deployed Engineer", "company": "Anthr
{"title": "Forward Deployed Engineer", "company": "Anthropic", "requ

Every intermediate state throws on JSON.parse. The string is only valid JSON for exactly one moment: when the final closing } arrives at content_block_stop.

This is the fundamental tension: streaming gives you the data early, but JSON requires completeness to parse.

Two Approaches

Option 1: Buffer and parse on content_block_stop. Accumulate the full input_json_delta string across all deltas, wait for the block to close, parse once. This is what 02-tool-use already does for tool inputs — it's safe, simple, and correct. Nothing renders until the complete object is ready.

Option 2: Incremental regex extraction. After each delta, scan the accumulated string for completed key-value pairs and render them as they finish. Fields pop in one by one. The full parse at content_block_stop then replaces the partial state with the guaranteed-correct final version.

This module implements the second approach for a job description parser — a use case where title and company are genuinely useful to show before requirements finish streaming.

How the Incremental Extraction Works

After each input_json_delta, the accumulated JSON string is scanned with two patterns:

// Completed string fields: "key": "value", or "key": "value"}
const stringPattern = /"(\w+)"\s*:\s*"([^"\\]*(?:\\.[^"\\]*)*)"\s*[,}]/g;

// Completed array fields: "key": ["item1", "item2"]
const arrayPattern = /"(\w+)"\s*:\s*\[([^\]]*)\]/g;

The key signal is the closing delimiter — a quote followed by a comma or closing brace for strings, a closing ] for arrays. If those aren't present in the accumulated string yet, the field isn't complete and gets skipped until the next delta arrives.

This is a heuristic, not a real parser. It works reliably for flat objects with string and array fields. It breaks on nested objects, arrays of objects, or fields with unusual escaped characters. For anything more complex, a proper streaming JSON parser library is the right tool.

The Two-Phase Rendering Pattern

The stream sends two distinct event types to the frontend:

// During stream — partial state for progressive rendering
send({ type: "delta", partial_json: chunk.delta.partial_json, accumulated: accumulatedJson });

// On content_block_stop — full parse guaranteed to succeed
send({ type: "complete", data: parsed });

The frontend maintains both states and switches when the complete event arrives:

const display = completeData ?? partialData;

During streaming, partialData drives the render — fields appear as they complete. When complete fires, completeData takes over and streaming indicators disappear. Fields that the regex didn't extract incrementally just appear all at once in the final swap. The user never sees a blank field — at worst, a field is slightly delayed.

When This Is Actually Worth It

Not always. The incremental approach adds complexity — two event types, two state objects, a regex that needs to be maintained, a fallback for fields the regex misses. For backend pipelines, evals, or anything where a machine consumes the output, buffering until content_block_stop is simpler and equally correct.

The incremental approach earns its complexity when three conditions are true:

The object has fields that are useful before the whole thing is done. A job description with title, company, location, and requirements — title and company render almost immediately. By the time requirements finish streaming, the user has already read the header. Compare this to a single summary field: there's no useful partial state to show.

Generation is slow enough that the user would notice the wait. Short objects complete quickly. Long objects with multi-sentence fields are where the latency gap between streaming and buffering becomes perceptible.

The fields are independent enough that partial state makes sense to display. Rendering a partial requirements list while title is already visible is fine. Rendering half a structured error message is not.

For the JD parser in this module, all three conditions hold. That's the right use case. For most structured output use cases — grading rubrics, eval scores, tool inputs — content_block_stop is the right place to parse, and reaching for incremental extraction would be premature.

The Broader Pattern

This module sits at the intersection of two earlier ones: the streaming event model from 01-streaming and the forced tool use schema from 04-structured-output. The combination surfaces a real product engineering question — not "does this technically work" but "when is the complexity justified" — which is a different kind of problem than getting the API call right.

In production, the answer to that question depends on the specific object shape, the typical generation time, and what partial state actually means for your UI. Getting that judgment right matters more than being able to implement the pattern.

Streaming Structured Output — Incremental JSON Rendering Without a Parser

The Core Problem: Partial JSON Is Invalid JSON

Two Approaches

How the Incremental Extraction Works

The Two-Phase Rendering Pattern

When This Is Actually Worth It

The Broader Pattern

Comments

Anthropic API

Evals — Why a Bad Eval Is Worse Than No Eval

More from this blog

Cost & Latency Tracking — What the Token Counts Were Telling Me All Along

Error Handling in LLM Systems — Three Categories, One Decision Tree

Topic Suggestion — Designing a Function That Knows What to Recommend Without Magic Numbers

Evals — Why a Bad Eval Is Worse Than No Eval

Command Palette

The Core Problem: Partial JSON Is Invalid JSON

Two Approaches

How the Incremental Extraction Works

The Two-Phase Rendering Pattern

When This Is Actually Worth It

The Broader Pattern

Comments

Anthropic API

Evals — Why a Bad Eval Is Worse Than No Eval

More from this blog