Structural diff: comparing JSON, YAML, and code without formatting noise — obfus.link

1. Insight

Insight

The problem this article addresses and why it matters.

Line-based diff lies to you about JSON

The default diff tool (git diff, diff -u, the GitHub PR view) compares two files line-by-line. That works perfectly for prose, makes sense for most code, and produces lies for structured data.

Compare these two JSON blobs:

{ "a": 1, "b": 2, "c": 3 }

{
  "b": 2,
  "a": 1,
  "c": 3
}

They're semantically identical. A git diff between them shows every line as a change. Now imagine that's a 500-line API response you're trying to audit pre-deploy — the line-based diff is dominated by formatting drift, the actual changes are buried, and the reviewer gives up and approves.

The same problem hits YAML configs (key reordering after a key sort), source code (Prettier or Black reformatting a file), and any structured payload where the order of keys or fields is semantically irrelevant but textually different.

Why structural diff is the right tool

A structural diff parses each input as the appropriate format (JSON, YAML, AST-aware for code) and compares them at the structure level. Reordered keys produce no diff. Whitespace changes produce no diff. Code reformatting (Prettier passes, Black passes, gofmt runs) produces no diff. The only changes that surface are the ones that matter — added keys, removed keys, changed values, moved nodes within an array where order is meaningful.

The tool in this article supports four modes: standard line-based diff (the default — sometimes you want it), JSON structural diff, YAML structural diff (with comment-stripping), and code-AST diff for JavaScript, TypeScript, and Python.

What this article delivers

End-to-end walks of all four modes against real-world inputs: a JSON API response pre/post migration, a YAML config after a sort-keys pass, a TypeScript file before and after Prettier ran. We cover the unified-diff output format, the structural-changes array that machine-readable consumers parse, and the cases where neither mode is right and a domain-specific diff (Markdown, SQL schema, Protobuf) is what you actually need.

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

Run a line-based diff (the default) when the inputs are prose or unstructured text
Switch to JSON structural diff to surface only semantic changes — reordered keys produce no diff
Run YAML structural diff that normalises comments and key order, ideal for config files
Run code-AST diff for JavaScript / TypeScript / Python where reformatting produces zero diff
Read the structuralChanges output — a machine-readable change list with per-change path, type (added / removed / changed / moved), and old / new values

The Examples section walks through each mode against an input pair that line-based diff handles poorly.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: JSON structural diff

API response pre-migration:

{
  "user_id": 42,
  "email":   "alice@example.com",
  "tier":    "pro",
  "settings": {
    "theme":    "dark",
    "language": "en"
  }
}

B (post-migration; keys re-sorted, one field added):

{
  "email":    "alice@example.com",
  "settings": {
    "language": "en",
    "theme":    "dark"
  },
  "tier":      "pro",
  "user_id":   42,
  "verified":  true
}

Line-based diff would flag every line. The structural diff:

diffChecker({
  textA:     A,
  textB:     B,
  format:    'json',
  structuralMode: 'json',
});

// identical: false
// stats: { additions: 1, deletions: 0, unchanged: 5 }
// structuralChanges: [
//   { path: 'verified', type: 'added', newValue: 'true' },
// ]
// diff: '+ verified: true'

One change. That's the truth — the only semantic difference between the two payloads is the new verified field. The order changes produced zero diff because they have zero semantic effect.

Before / after: YAML config after a key sort

You committed a YAML config, then a teammate's IDE ran a yaml-sort pass before they pushed:

service:
  name: orders-api
  port: 8080
database:
  host: orders-db
  port: 5432
  ssl: true
features:
  rate_limiting: true
  audit_log:     false

B (sorted):

database:
  host: orders-db
  port: 5432
  ssl: true
features:
  audit_log:     false
  rate_limiting: true
service:
  name: orders-api
  port: 8080

Standard diff shows every line changed. With structuralMode: 'yaml':

diffChecker({
  textA:           A,
  textB:           B,
  format:          'json',
  structuralMode:  'yaml',
  ignoreWhitespace: true,
});

// identical: true
// stats:     { additions: 0, deletions: 0, unchanged: 8 }
// structuralChanges: []

True. The two YAMLs parse to the same object — no semantic change. CI can pass automatically without a human reviewer wading through 16 lines of reorderings.

Before / after: code-AST diff after Prettier

A TypeScript file before and after prettier --write:

import {a,b,c} from './foo'
export const bar=(x:number,y:string)=>{return x+y.length;}

import { a, b, c } from './foo';

export const bar = (x: number, y: string) => {
  return x + y.length;
};

Line diff: every line changed. AST diff:

diffChecker({
  textA:          A,
  textB:          B,
  format:         'json',
  structuralMode: 'code-ast',
  language:       'typescript',
});

// identical: true
// structuralChanges: []

Zero changes. The Prettier pass produced different text but identical meaning. The AST diff is the right test for "did this commit actually change behaviour?" — useful in CI pre-merge checks where formatting-only commits should auto-approve and behaviour changes should require review.

When the same input changes a single argument:

// B':
import { a, b, c } from './foo';

export const bar = (x: number, y: string) => {
  return x.toString() + y.length;
};

// structuralChanges: [
//   {
//     path:       'function:bar.body[0].argument.left',
//     type:       'changed',
//     oldValue:   'Identifier(x)',
//     newValue:   'CallExpression(x.toString())',
//   },
// ]

One change. The path string is the AST-node coordinate; the new and old values are the relevant node summaries.

Before / after: side-by-side vs unified output

The format parameter controls the diff rendering:

// format: 'unified'
'@@ -1,3 +1,4 @@\n  "user_id": 42,\n+ "verified": true,\n  "email": "alice@example.com",'

// format: 'side-by-side'
'A                          | B\nuser_id: 42                | user_id: 42\n(nothing)                  | verified: true\nemail: alice@example.com   | email: alice@example.com'

// format: 'json' (machine-readable)
[{path: 'verified', type: 'added', newValue: 'true'}]

Use unified for CLI / PR review. Side-by-side for human inspection of long diffs. JSON for machine consumers (CI, auto-summarisers, agent pipelines).

When humans use this

Three scenarios dominate:

Code review of formatting-heavy PRs. A PR with 800 lines changed turns out to be one logic change wrapped in a Prettier pass. AST diff identifies the one logic change in 200ms.
API response auditing. A consumer team validating that a planned API change doesn't break their parser runs the old and new sample payloads through JSON structural diff. The single-key-renamed change surfaces; the unrelated formatting churn vanishes.
Config drift detection. A pre-deploy gate compares the staging YAML config against production. Drift in actual settings surfaces; drift in formatting doesn't generate noise.

When agents use this

Three patterns:

Pre-merge formatting filter. An agent reviewing PRs runs an AST diff first. Zero-change PRs auto-approve with a "formatting-only" label. Non-zero changes route to a human reviewer with the AST diff attached as context.
API breaking-change detector. A scheduled agent compares the current API response shape against a stored baseline. JSON structural diff surfaces every key addition, removal, or value-type change. Breaking changes (key removal, type change) open an alert.
Migration verification. An agent running a service migration compares pre- and post-migration outputs of a representative request. Any non-zero structural diff blocks the cutover until reviewed.

Edge cases

Array-order significance

JSON / YAML structural diff treats arrays as ordered by default. Reordering array elements produces a moved change. Pass arrayOrderInsensitive: true to treat arrays as sets — useful for tag lists, permission arrays, etc. where order is semantically irrelevant.

Comments in YAML

The YAML diff strips comments before comparing. If you need to track comment changes, fall back to line-based diff with ignoreWhitespace: true.

Inferred type changes in code-AST

A field that changes from string to string | undefined (TypeScript) is a meaningful change even when the runtime code is unchanged. The AST diff surfaces type-annotation changes as type: 'changed'. To ignore type-only changes, pass ignoreTypeAnnotations: true.

Very large inputs

Each input is parsed before comparison. The AST mode is the most memory-intensive — a 50K-line TypeScript file consumes about 100MB during parsing. Files over 1MB return INPUT_TOO_LARGE; fall back to line-based diff for whole-file comparisons or chunk the input by export.

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field	Type	Required	Default	Description
`textA`	`string`	✓	—	First input
`textB`	`string`	✓	—	Second input
`format`	`'unified' \| 'side-by-side' \| 'json'`	✓	—	Output rendering
`ignoreWhitespace`	`boolean`	✗	`false`	Treat whitespace-only changes as no diff (line mode)
`structuralMode`	`'text' \| 'json' \| 'yaml' \| 'code-ast'`	✗	`'text'`	Comparison strategy
`language`	`'javascript' \| 'typescript' \| 'python'`	for `code-ast`	—	AST parser language
`arrayOrderInsensitive`	`boolean`	✗	`false`	JSON / YAML — treat arrays as sets
`ignoreTypeAnnotations`	`boolean`	✗	`false`	code-ast — suppress changes to TypeScript type annotations

Output shape

{
  identical: boolean;
  diff:      string;              // formatted diff in the chosen `format`
  stats: {
    additions:  number;
    deletions:  number;
    unchanged:  number;
  };
  structuralChanges?: Array<{     // when structuralMode is json/yaml/code-ast
    path:     string;             // e.g. 'data.users[0].email' or 'function:handleClick'
    type:     'added' | 'removed' | 'changed' | 'moved';
    oldValue?: string;
    newValue?: string;
  }>;
}

Change-path notation

Mode	Example path	Meaning
JSON	`users[2].email`	Index 2 of `users` array, `email` field
JSON	`data.config.feature_flags["new_ui"]`	Nested key with string-indexed map access
YAML	Same as JSON	YAML diff parses to a JSON-equivalent structure
code-ast	`function:handleClick.body[1]`	Statement at index 1 in handleClick function body
code-ast	`class:User.method:save.parameters[0]`	First parameter of User.save method

Error codes

Code	When it fires	Recovery
`INPUT_EMPTY`	Both `textA` and `textB` empty	Provide non-empty inputs
`INPUT_TOO_LARGE`	Either input exceeds 1MB	Fall back to line-based diff or chunk the input
`INPUT_MALFORMED`	`structuralMode: 'json'` and one input is not valid JSON	Fix the input or fall back to line mode
`UNSUPPORTED_LANGUAGE`	`structuralMode: 'code-ast'` with language outside supported set	Use line or JSON mode; or pre-translate the source
`PARSE_FAILED`	AST parser hit a syntax error in `code-ast` mode	Fix the source; AST mode requires valid code

When NOT to use this tool

For binary file diffs (images, compiled artefacts), this tool is the wrong layer. Use a binary diff tool (xxd | diff, radare2, dedicated image-diff like dssim). The structural mode is for parsable text.

For database-schema diffs (comparing two CREATE TABLE statements), use a dedicated schema-diff tool (atlas, sqitch, migra). The SQL grammar is rich enough that text-level diff produces too many false positives — but full schema diff requires understanding constraint precedence, index dependencies, and migration ordering this tool doesn't model.

For very large diffs (whole-repository changes, multi-MB exports), the in-memory full-content comparison is the wrong shape. Use git diff with directory-level operations, or a streaming-diff library.

Performance notes

Line-based mode: under 5ms for inputs under 100KB. JSON / YAML structural diff: 5-20ms depending on depth. AST diff: 50-300ms depending on file size; the parse dominates. The tool is deterministic — same inputs always produce the same diff. REST responses are Edge-Cache eligible (cache keys include textA + textB so cache hit rate is low in practice unless both inputs are reused).

The JSON structural diff handles nested objects up to 1000 levels deep before returning INPUT_MALFORMED. Real-world inputs rarely exceed 20-30 levels.