obfus.link
Analyzers

Structural diff: comparing JSON, YAML, and code without formatting noise

Compare two text inputs as JSON structure (ignore key order), YAML structure (ignore comments and reordering), or AST (ignore formatting in JavaScript, TypeScript, Python). Surface only the changes that matter.

The Diff Checker compares two text inputs in line, JSON, YAML, or code-AST mode. Structural modes ignore semantically irrelevant differences (key order, formatting, comments) and produce a machine-readable structuralChanges array with the path, change type, and old / new values for each meaningful difference.

1. Insight

Insight

The problem this article addresses and why it matters.

Line-based diff lies to you about JSON

The default diff tool (git diff, diff -u, the GitHub PR view) compares two files line-by-line. That works perfectly for prose, makes sense for most code, and produces lies for structured data.

Compare these two JSON blobs:

{ "a": 1, "b": 2, "c": 3 }
{
  "b": 2,
  "a": 1,
  "c": 3
}

They're semantically identical. A git diff between them shows every line as a change. Now imagine that's a 500-line API response you're trying to audit pre-deploy — the line-based diff is dominated by formatting drift, the actual changes are buried, and the reviewer gives up and approves.

The same problem hits YAML configs (key reordering after a key sort), source code (Prettier or Black reformatting a file), and any structured payload where the order of keys or fields is semantically irrelevant but textually different.

Why structural diff is the right tool

A structural diff parses each input as the appropriate format (JSON, YAML, AST-aware for code) and compares them at the structure level. Reordered keys produce no diff. Whitespace changes produce no diff. Code reformatting (Prettier passes, Black passes, gofmt runs) produces no diff. The only changes that surface are the ones that matter — added keys, removed keys, changed values, moved nodes within an array where order is meaningful.

The tool in this article supports four modes: standard line-based diff (the default — sometimes you want it), JSON structural diff, YAML structural diff (with comment-stripping), and code-AST diff for JavaScript, TypeScript, and Python.

What this article delivers

End-to-end walks of all four modes against real-world inputs: a JSON API response pre/post migration, a YAML config after a sort-keys pass, a TypeScript file before and after Prettier ran. We cover the unified-diff output format, the structural-changes array that machine-readable consumers parse, and the cases where neither mode is right and a domain-specific diff (Markdown, SQL schema, Protobuf) is what you actually need.

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

  • Run a line-based diff (the default) when the inputs are prose or unstructured text
  • Switch to JSON structural diff to surface only semantic changes — reordered keys produce no diff
  • Run YAML structural diff that normalises comments and key order, ideal for config files
  • Run code-AST diff for JavaScript / TypeScript / Python where reformatting produces zero diff
  • Read the structuralChanges output — a machine-readable change list with per-change path, type (added / removed / changed / moved), and old / new values

The Examples section walks through each mode against an input pair that line-based diff handles poorly.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: JSON structural diff

API response pre-migration:

A:

{
  "user_id": 42,
  "email":   "alice@example.com",
  "tier":    "pro",
  "settings": {
    "theme":    "dark",
    "language": "en"
  }
}

B (post-migration; keys re-sorted, one field added):

{
  "email":    "alice@example.com",
  "settings": {
    "language": "en",
    "theme":    "dark"
  },
  "tier":      "pro",
  "user_id":   42,
  "verified":  true
}

Line-based diff would flag every line. The structural diff:

diffChecker({
  textA:     A,
  textB:     B,
  format:    'json',
  structuralMode: 'json',
});

// identical: false
// stats: { additions: 1, deletions: 0, unchanged: 5 }
// structuralChanges: [
//   { path: 'verified', type: 'added', newValue: 'true' },
// ]
// diff: '+ verified: true'

One change. That's the truth — the only semantic difference between the two payloads is the new verified field. The order changes produced zero diff because they have zero semantic effect.

Before / after: YAML config after a key sort

You committed a YAML config, then a teammate's IDE ran a yaml-sort pass before they pushed:

A:

service:
  name: orders-api
  port: 8080
database:
  host: orders-db
  port: 5432
  ssl: true
features:
  rate_limiting: true
  audit_log:     false

B (sorted):

database:
  host: orders-db
  port: 5432
  ssl: true
features:
  audit_log:     false
  rate_limiting: true
service:
  name: orders-api
  port: 8080

Standard diff shows every line changed. With structuralMode: 'yaml':

diffChecker({
  textA:           A,
  textB:           B,
  format:          'json',
  structuralMode:  'yaml',
  ignoreWhitespace: true,
});

// identical: true
// stats:     { additions: 0, deletions: 0, unchanged: 8 }
// structuralChanges: []

True. The two YAMLs parse to the same object — no semantic change. CI can pass automatically without a human reviewer wading through 16 lines of reorderings.

Before / after: code-AST diff after Prettier

A TypeScript file before and after prettier --write:

A:

import {a,b,c} from './foo'
export const bar=(x:number,y:string)=>{return x+y.length;}

B:

import { a, b, c } from './foo';

export const bar = (x: number, y: string) => {
  return x + y.length;
};

Line diff: every line changed. AST diff:

diffChecker({
  textA:          A,
  textB:          B,
  format:         'json',
  structuralMode: 'code-ast',
  language:       'typescript',
});

// identical: true
// structuralChanges: []

Zero changes. The Prettier pass produced different text but identical meaning. The AST diff is the right test for "did this commit actually change behaviour?" — useful in CI pre-merge checks where formatting-only commits should auto-approve and behaviour changes should require review.

When the same input changes a single argument:

// B':
import { a, b, c } from './foo';

export const bar = (x: number, y: string) => {
  return x.toString() + y.length;
};
// structuralChanges: [
//   {
//     path:       'function:bar.body[0].argument.left',
//     type:       'changed',
//     oldValue:   'Identifier(x)',
//     newValue:   'CallExpression(x.toString())',
//   },
// ]

One change. The path string is the AST-node coordinate; the new and old values are the relevant node summaries.

Before / after: side-by-side vs unified output

The format parameter controls the diff rendering:

// format: 'unified'
'@@ -1,3 +1,4 @@\n  "user_id": 42,\n+ "verified": true,\n  "email": "alice@example.com",'

// format: 'side-by-side'
'A                          | B\nuser_id: 42                | user_id: 42\n(nothing)                  | verified: true\nemail: alice@example.com   | email: alice@example.com'

// format: 'json' (machine-readable)
[{path: 'verified', type: 'added', newValue: 'true'}]

Use unified for CLI / PR review. Side-by-side for human inspection of long diffs. JSON for machine consumers (CI, auto-summarisers, agent pipelines).

When humans use this

Three scenarios dominate:

  • Code review of formatting-heavy PRs. A PR with 800 lines changed turns out to be one logic change wrapped in a Prettier pass. AST diff identifies the one logic change in 200ms.
  • API response auditing. A consumer team validating that a planned API change doesn't break their parser runs the old and new sample payloads through JSON structural diff. The single-key-renamed change surfaces; the unrelated formatting churn vanishes.
  • Config drift detection. A pre-deploy gate compares the staging YAML config against production. Drift in actual settings surfaces; drift in formatting doesn't generate noise.

When agents use this

Three patterns:

  • Pre-merge formatting filter. An agent reviewing PRs runs an AST diff first. Zero-change PRs auto-approve with a "formatting-only" label. Non-zero changes route to a human reviewer with the AST diff attached as context.
  • API breaking-change detector. A scheduled agent compares the current API response shape against a stored baseline. JSON structural diff surfaces every key addition, removal, or value-type change. Breaking changes (key removal, type change) open an alert.
  • Migration verification. An agent running a service migration compares pre- and post-migration outputs of a representative request. Any non-zero structural diff blocks the cutover until reviewed.

Edge cases

Array-order significance

JSON / YAML structural diff treats arrays as ordered by default. Reordering array elements produces a moved change. Pass arrayOrderInsensitive: true to treat arrays as sets — useful for tag lists, permission arrays, etc. where order is semantically irrelevant.

Comments in YAML

The YAML diff strips comments before comparing. If you need to track comment changes, fall back to line-based diff with ignoreWhitespace: true.

Inferred type changes in code-AST

A field that changes from string to string | undefined (TypeScript) is a meaningful change even when the runtime code is unchanged. The AST diff surfaces type-annotation changes as type: 'changed'. To ignore type-only changes, pass ignoreTypeAnnotations: true.

Very large inputs

Each input is parsed before comparison. The AST mode is the most memory-intensive — a 50K-line TypeScript file consumes about 100MB during parsing. Files over 1MB return INPUT_TOO_LARGE; fall back to line-based diff for whole-file comparisons or chunk the input by export.

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field

Type

Required

Default

Description

textA

string

First input

textB

string

Second input

format

'unified' | 'side-by-side' | 'json'

Output rendering

ignoreWhitespace

boolean

false

Treat whitespace-only changes as no diff (line mode)

structuralMode

'text' | 'json' | 'yaml' | 'code-ast'

'text'

Comparison strategy

language

'javascript' | 'typescript' | 'python'

for code-ast

AST parser language

arrayOrderInsensitive

boolean

false

JSON / YAML — treat arrays as sets

ignoreTypeAnnotations

boolean

false

code-ast — suppress changes to TypeScript type annotations

Output shape

{
  identical: boolean;
  diff:      string;              // formatted diff in the chosen `format`
  stats: {
    additions:  number;
    deletions:  number;
    unchanged:  number;
  };
  structuralChanges?: Array<{     // when structuralMode is json/yaml/code-ast
    path:     string;             // e.g. 'data.users[0].email' or 'function:handleClick'
    type:     'added' | 'removed' | 'changed' | 'moved';
    oldValue?: string;
    newValue?: string;
  }>;
}

Change-path notation

Mode

Example path

Meaning

JSON

users[2].email

Index 2 of users array, email field

JSON

data.config.feature_flags["new_ui"]

Nested key with string-indexed map access

YAML

Same as JSON

YAML diff parses to a JSON-equivalent structure

code-ast

function:handleClick.body[1]

Statement at index 1 in handleClick function body

code-ast

class:User.method:save.parameters[0]

First parameter of User.save method

Error codes

Code

When it fires

Recovery

INPUT_EMPTY

Both textA and textB empty

Provide non-empty inputs

INPUT_TOO_LARGE

Either input exceeds 1MB

Fall back to line-based diff or chunk the input

INPUT_MALFORMED

structuralMode: 'json' and one input is not valid JSON

Fix the input or fall back to line mode

UNSUPPORTED_LANGUAGE

structuralMode: 'code-ast' with language outside supported set

Use line or JSON mode; or pre-translate the source

PARSE_FAILED

AST parser hit a syntax error in code-ast mode

Fix the source; AST mode requires valid code

When NOT to use this tool

For binary file diffs (images, compiled artefacts), this tool is the wrong layer. Use a binary diff tool (xxd | diff, radare2, dedicated image-diff like dssim). The structural mode is for parsable text.

For database-schema diffs (comparing two CREATE TABLE statements), use a dedicated schema-diff tool (atlas, sqitch, migra). The SQL grammar is rich enough that text-level diff produces too many false positives — but full schema diff requires understanding constraint precedence, index dependencies, and migration ordering this tool doesn't model.

For very large diffs (whole-repository changes, multi-MB exports), the in-memory full-content comparison is the wrong shape. Use git diff with directory-level operations, or a streaming-diff library.

Performance notes

Line-based mode: under 5ms for inputs under 100KB. JSON / YAML structural diff: 5-20ms depending on depth. AST diff: 50-300ms depending on file size; the parse dominates. The tool is deterministic — same inputs always produce the same diff. REST responses are Edge-Cache eligible (cache keys include textA + textB so cache hit rate is low in practice unless both inputs are reused).

The JSON structural diff handles nested objects up to 1000 levels deep before returning INPUT_MALFORMED. Real-world inputs rarely exceed 20-30 levels.

Try it now

Diff Checker

Structural diff for text, JSON, YAML, and code

FAQ

Frequently asked questions

When should I use line-based diff vs structural diff?

Line-based for prose, unstructured text, log files, or anything where exact bytes matter. Structural for any input where order or formatting is semantically irrelevant — JSON API responses, YAML configs, code under Prettier/Black, etc. Structural diff is the right answer for code review of formatting-heavy PRs.

Can it diff TypeScript with type annotations?

Yes. Type changes surface as structuralChanges entries with type: 'changed'. Pass ignoreTypeAnnotations: true to suppress type-only changes if you only care about runtime behaviour.

How does it handle arrays in JSON?

By default arrays are order-sensitive — [1,2,3] vs [3,2,1] produces three moved entries. Pass arrayOrderInsensitive: true to treat arrays as sets — useful for tag lists, permission arrays. Don't set this when the array order is semantically meaningful (ordered steps, breadcrumbs).

Why does my code-AST diff fail on valid TypeScript?

The AST parser handles standard syntax; very recent TypeScript features (e.g. some 5.5+ decorator metadata constructs) may produce PARSE_FAILED. Pre-compile the source to plain JS before diffing, or fall back to JSON mode if the comparison is structural-content rather than syntax.