Recovering structured JSON from malformed LLM responses — obfus.link

1. Insight

Insight

The problem this article addresses and why it matters.

The "respond in JSON" tax

Every team using an LLM in production has hit the same wall: you prompt the model to "respond in JSON only, no prose," and it does — until the day it doesn't. Sometimes the model wraps the JSON in a ```json fence. Sometimes it adds a single explanatory sentence above. Sometimes — and this is the one that pages you at 2am — it returns a structurally valid object with a trailing comma that breaks JSON.parse. OpenAI's Structured Outputs and Anthropic's tool-use schema cover the happy path. They don't cover legacy models, fine-tuned variants, fallback paths, or the dozen smaller providers that don't expose schema-enforced decoding.

The naïve fix is try { JSON.parse(raw) } catch { ... retry the LLM call ... }. That works most of the time. It also doubles your token cost on a non-trivial fraction of requests and pushes a one-shot parse error into a multi-second retry loop that compounds when chained agents are sharing the same upstream provider's rate limit.

Why surface-level regex stripping isn't enough

The common quick-fix is raw.replace(/^```json\n/, '').replace(/```$/, ''). It catches the markdown fence case. It does nothing for trailing commas, single-quoted strings, unquoted property names, smart quotes (" instead of "), comments (// like this), or truncated responses (model hit max_tokens mid-object). Each of those failure modes is rare. The point is that they're not rare together: across a production volume of 100k calls/day, you'll see every variant before the end of the week.

What this article delivers

The llm_to_json_cleaner tool handles the full set: markdown fence stripping, trailing-comma repair, comment removal, smart-quote normalisation, unquoted-key promotion, and (in aggressive mode) structural completion of truncated responses. We'll walk through the repair pipeline, show how to wire the optional Zod schema validation pass to make the cleaner double as a contract enforcement step, and explain the repairConfidence score that tells you how much the tool had to guess.

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

Parse LLM output that fails JSON.parse without a round-trip retry to the model
Choose between conservative and aggressive repair strategies based on confidence in the upstream response
Layer Zod schema validation onto the cleaner so a single call gives you both parsed JSON and structured field errors
Use the repairConfidence score to gate auto-acceptance vs human review in agent pipelines
Identify the failure modes that the cleaner cannot recover from and what to do when you hit them

The Examples section walks through each repair category with the before/after that triggers it.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: the markdown-fence wrapper

The most common output from a chat-style model:

Before:

Sure, here's the JSON you asked for:

```json
{
  "user_id": 4291,
  "email": "alice@example.com",
  "verified": true
}
```

Let me know if you need anything else!

JSON.parse throws Unexpected token S in JSON at position 0.

After:

const result = llmToJsonCleaner({
  raw:             llmResponse,
  strict:          false,
  repairStrategy:  'conservative',
});

result.json
// { user_id: 4291, email: 'alice@example.com', verified: true }

result.transformations
// [
//   { type: 'strip_prose_preamble', position: 0 },
//   { type: 'strip_markdown_fence', position: 27 },
//   { type: 'strip_prose_trailer',  position: 145 },
// ]

result.repairConfidence  // 1.0 — only wrapping removed, content untouched

The transformations array names every change so you can audit what was stripped. The repairConfidence: 1.0 signals that the inner content was preserved exactly.

Before / after: trailing commas + comments + smart quotes

Sometimes a model emits JSON that looks fine to a human and breaks on the parser:

Before:

{
  // Q4 results
  "revenue": 1_240_000,
  "growth":  "0.23",
  "regions": ["NA", "EU", "APAC",],
}

Three issues: the // comment, the underscore numeric separator (valid JS, not JSON), the trailing comma after "APAC". The model produced something a Node REPL would accept.

After:

llmToJsonCleaner({
  raw:             malformed,
  strict:          false,
  repairStrategy:  'conservative',
});

// json: {
//   revenue: 1240000,
//   growth:  "0.23",
//   regions: ["NA", "EU", "APAC"],
// }
// transformations: [
//   { type: 'strip_line_comment',     position: 6 },
//   { type: 'normalise_numeric_sep',  position: 27 },
//   { type: 'strip_trailing_comma',   position: 88 },
// ]
// repairConfidence: 0.94

Confidence is below 1.0 because the cleaner changed characters (normalising the numeric separator) rather than only stripping wrapping.

Before / after: truncated response

When the model hits max_tokens mid-output, you get a response that ends abruptly:

Before:

{
  "users": [
    { "id": 1, "name": "Alice" },
    { "id": 2, "name": "Bob" },
    { "id": 3, "name": "Charlie

repairStrategy: 'conservative' returns json: null with transformations: [] and repairConfidence: 0.0 — the truncation is too risky to guess at.

repairStrategy: 'aggressive' attempts structural completion:

llmToJsonCleaner({
  raw:             truncated,
  strict:          false,
  repairStrategy:  'aggressive',
});

// json: {
//   users: [
//     { id: 1, name: 'Alice' },
//     { id: 2, name: 'Bob' },
//   ]  // <-- Charlie dropped, the truncated entry is removed
// }
// transformations: [
//   { type: 'drop_truncated_element', position: 79 },
//   { type: 'close_array',            position: 109 },
//   { type: 'close_object',           position: 110 },
// ]
// repairConfidence: 0.41

repairConfidence: 0.41 is the signal that the result is significantly synthesised. Agents typically gate on < 0.6 to route to a re-query or a human-in-the-loop step.

Before / after: validation pass alongside repair

The targetSchema parameter accepts a Zod-compatible JSON Schema. The cleaner repairs the input, then validates the cleaned object against your schema in the same call:

const result = llmToJsonCleaner({
  raw:             llmResponse,
  strict:          true,
  repairStrategy:  'conservative',
  targetSchema: {
    type: 'object',
    properties: {
      user_id:  { type: 'number' },
      email:    { type: 'string', format: 'email' },
      verified: { type: 'boolean' },
    },
    required: ['user_id', 'email', 'verified'],
  },
});

result.json           // parsed + cleaned
result.schemaErrors   // [{ path: 'email', message: 'invalid email format' }, ...]

A single round-trip replaces what would otherwise be clean → JSON.parse → Zod.parse as three separate steps in a pipeline, each with its own error path.

When humans use this

A developer iterating on a prompt sees the cleaner's transformations log and uses it to spot which output patterns the model produces most often. That informs prompt tweaks (e.g. "do not wrap your response in markdown") that reduce the rate of repair-needed responses over time. The repairConfidence score also surfaces in the web UI as a coloured badge — green for ≥ 0.95, amber for 0.6-0.94, red for < 0.6 — giving an at-a-glance sense of how trustworthy a given response was.

When agents use this

This is the highest-value tool in the grid for agentic workflows because LLM output failures are the most expensive failure mode in an agent pipeline. A single re-query to a frontier model can cost 10-100× a llm_to_json_cleaner call. The pattern in production:

Agent receives raw LLM output it expects to be JSON.
Agent calls llm_to_json_cleaner with strict: false, repairStrategy: 'conservative', and the expected schema.
If repairConfidence ≥ 0.95 AND schemaErrors is empty — accept and continue.
If repairConfidence ≥ 0.6 AND schemaErrors is empty — log and continue (the cleaner had to repair, but the schema check passed, so the result is trustworthy).
If repairConfidence < 0.6 OR schemaErrors non-empty — retry the LLM call once (only once), now with the schema errors injected into the prompt.

This converts a 5-10% LLM-output retry rate into a 0.5-1% retry rate, with the savings going straight to the bottom-line token budget.

Edge cases

JSON inside markdown inside JSON

Nested wrapping (markdown fence inside a JSON string value) is unsupported. The cleaner unwraps the outer fence. The inner content is preserved verbatim, including any inner fence the model emitted as data. If you need recursive unwrapping, pipe the inner value back through the cleaner with the same parameters.

Mixed-case keys

If the model returns {"User_ID": 1} and your schema expects user_id, the cleaner does not rename keys. That's a schema transformation, not a JSON repair. Use the schema validation pass to surface the mismatch as a schemaErrors entry, then handle it explicitly in your agent logic.

Empty input

llmToJsonCleaner({ raw: '' }) returns INPUT_EMPTY. Empty input is not "missing JSON" — the cleaner won't synthesise structure from nothing.

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field	Type	Required	Default	Description
`raw`	`string`	✓	—	The LLM's response, exactly as received
`strict`	`boolean`	✗	`false`	When true, return error on any unrepairable input. When false, do best-effort and surface `repairConfidence`
`targetSchema`	`object`	✗	—	Zod-compatible JSON Schema. When provided, the cleaner runs a validation pass and surfaces structured `schemaErrors`
`repairStrategy`	`'conservative' \| 'aggressive'`	✗	`'conservative'`	Conservative: strip wrapping, fix commas/comments only. Aggressive: also infer closing brackets and complete truncated responses

Output shape

{
  json:             object | null;        // parsed result, null only when strict + unrepairable
  cleaned:          string;               // the post-repair JSON string
  transformations:  Array<{
    type:     string;     // 'strip_markdown_fence' | 'strip_trailing_comma' | ...
    description: string;
    position?:   number;  // character offset in raw input
  }>;
  schemaErrors?:    Array<{               // only when targetSchema provided
    path:    string;      // e.g. 'data.users[0].email'
    message: string;
  }>;
  repairConfidence: number;               // 0.0 to 1.0 — see below
}

Repair confidence scale

Score	Meaning
1.0	Only wrapping (markdown fence, prose) stripped; inner content untouched
0.95 – 0.99	Minor non-destructive fixes (trailing commas, comments)
0.8 – 0.94	Character-level changes (smart quotes, numeric separators)
0.6 – 0.79	Structural changes (unquoted key promotion, single-quoted string conversion)
< 0.6	Aggressive structural completion (truncation repair, inferred closures) — gate before auto-accepting

Error codes

Code	When it fires	Recovery
`INPUT_EMPTY`	`raw` is empty or whitespace	Provide a non-empty input
`INPUT_TOO_LARGE`	`raw` exceeds 500KB	Pre-truncate; LLM responses rarely need to be this large
`PARSE_FAILED`	Strict mode + unrepairable input	Either switch to non-strict mode or retry the LLM call with a clearer prompt
`SECURITY_VIOLATION`	Prompt-injection pattern detected	Sanitise the LLM response source upstream; do not retry with the same input

When NOT to use this tool

If your model supports schema-enforced decoding (OpenAI Structured Outputs, Anthropic tool-use with input_schema), use that first. Native schema enforcement is structurally better than repair-after-the-fact. The cleaner is for: legacy models without schema support, fallback paths when the schema-aware call fails, and provider switching where you can't guarantee the new provider has the same enforcement.

If the input might contain a JSON payload mixed with non-JSON content the user actually wants to read (e.g. an LLM-generated email with a JSON block at the bottom), the cleaner's prose-stripping behaviour is too aggressive. Extract the JSON range manually before passing to the cleaner.

Performance notes

Typical execution: under 5ms for inputs below 50KB. Strict-mode validation pass adds 2-10ms depending on schema depth. Memory usage is bounded by the input size (no streaming yet — single-pass parser). The repair is deterministic: same input + same parameters always produces byte-identical output, which means the result is eligible for Edge Cache when called via the REST endpoint.

Insight

The "respond in JSON" tax

Why surface-level regex stripping isn't enough

What this article delivers

Intent

Examples

Before / after: the markdown-fence wrapper

Before / after: trailing commas + comments + smart quotes

Before / after: truncated response

Before / after: validation pass alongside repair

When humans use this

When agents use this

Edge cases

JSON inside markdown inside JSON

Mixed-case keys

Empty input

Documentation

Input parameters

Output shape

Repair confidence scale

Error codes

When NOT to use this tool

Performance notes

Frequently asked questions