Deep URL parsing: recursive query decode for OAuth + webhook callbacks — obfus.link

1. Insight

Insight

The problem this article addresses and why it matters.

URLs are JSON in disguise

A URL like https://example.com/oauth/callback?state=eyJrIjoidXNlcl81NDIxIiwiZXhwIjoxNzQ3OTI5ODkyfQ&redirect_uri=https%3A%2F%2Fapp.example.com%2Fhome%3Ftab%3Ddashboard&code=auth_xyz789 is a serialised structure containing a base64-encoded JSON state token, a URL-encoded redirect URI that itself has a query parameter, and an opaque authorisation code. To understand what just happened in an OAuth flow, you need to decode every layer.

Standard URL parsing (new URL(href)) handles the surface: protocol, host, path, query parameters as a flat key-value map. It doesn't decode the contents of query parameter values. A debugging session that starts with "why isn't the redirect working?" ends with 20 minutes of manually base64-decoding the state parameter, URL-decoding the redirect_uri, then realising the inner redirect_uri has its own query parameter that needed a separate decode.

Why a deep-parsing tool

The tool in this article does the deep work automatically. Pass deepQueryParse: true and the tool recursively decodes every query parameter value, detecting whether each value is plain text, base64, URL-encoded, JWT, or a nested JSON / URL. The output gives you the surface-level parse (protocol, host, path) plus a deepQuery map where each parameter is annotated with its detected type and parsed contents.

It also runs security flags across the whole URL: credentials in the query string (a known anti-pattern that exposes secrets in log files), open-redirect patterns (redirect_uri pointing at a different host), excessively long parameter values (a DoS vector for parsers that don't bound their input).

What this article delivers

End-to-end walks of parsing an OAuth callback URL, a webhook delivery URL with embedded tracking parameters, and a URL containing credentials in the query string. We cover the security-flag output, the cases where deep parsing surfaces structure the developer didn't realise was there, and the cases where the deep parser is the wrong tool (a query parameter value that's structurally similar to JSON but is actually a serialised proprietary format).

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

Parse any URL into protocol, host, path segments, flat query parameters, and hash fragment
Use deep-query mode to recursively decode base64-encoded, URL-encoded, JWT, JSON-shaped, and nested-URL query parameter values
Read the security flags output that surfaces credentials in query strings, open-redirect patterns, and excessively long parameter values
Trace OAuth callbacks, webhook deliveries, and analytics-instrumented URLs back to their original semantic structure
Recognise the cases where the deep parser misinterprets a value (a base64-shaped opaque ID that isn't actually base64-encoded JSON)

The Examples section walks through an OAuth callback, a webhook URL with credentials, and a deeply-nested redirect chain.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: parsing an OAuth callback

urlParser({
  url:            'https://example.com/oauth/callback?state=eyJrIjoidXNlcl81NDIxIiwiZXhwIjoxNzQ3OTI5ODkyfQ&redirect_uri=https%3A%2F%2Fapp.example.com%2Fhome%3Ftab%3Ddashboard&code=auth_xyz789',
  decode:         true,
  deepQueryParse: true,
});

// valid:    true
// protocol: 'https:'
// host:     'example.com'
// port:     null
// path:     '/oauth/callback'
// pathSegments: ['oauth', 'callback']
// query: {
//   state:        'eyJrIjoidXNlcl81NDIxIiwiZXhwIjoxNzQ3OTI5ODkyfQ',
//   redirect_uri: 'https://app.example.com/home?tab=dashboard',
//   code:         'auth_xyz789',
// }
// hash: ''
// deepQuery: {
//   state: {
//     rawValue:     'eyJrIjoidXNlcl81NDIxIiwiZXhwIjoxNzQ3OTI5ODkyfQ',
//     decodedValue: '{"k":"user_5421","exp":1747929892}',
//     type:         'base64',
//     parsed:       { k: 'user_5421', exp: 1747929892 },
//   },
//   redirect_uri: {
//     rawValue:     'https%3A%2F%2Fapp.example.com%2Fhome%3Ftab%3Ddashboard',
//     decodedValue: 'https://app.example.com/home?tab=dashboard',
//     type:         'url',
//     parsed:       { protocol: 'https:', host: 'app.example.com', path: '/home', query: { tab: 'dashboard' } },
//   },
//   code: {
//     rawValue:     'auth_xyz789',
//     decodedValue: 'auth_xyz789',
//     type:         'plain',
//   },
// }
// securityFlags: []

The flat parse shows the surface URL. The deepQuery field is where the value is — the state parameter is base64-decoded and shown to be a JSON object with a user ID and expiry; the redirect_uri is URL-decoded and recursively parsed (it has its own query.tab parameter).

Debugging "why isn't the redirect working?" now takes 30 seconds: read the deep parse output and check if the inner redirect URL's host is what you expect.

Before / after: credentials in query string

urlParser({
  url: 'https://api.example.com/v1/users?api_key=sk_live_abc123def456&limit=100',
  decode: true,
});

// query: { api_key: 'sk_live_abc123def456', limit: '100' }
// securityFlags: [
//   {
//     issue:    'API key detected in query string. Query parameters appear in server access logs, browser history, and Referer headers — credentials should be in Authorization header instead.',
//     severity: 'critical',
//     param:    'api_key',
//   },
// ]

The security flag catches the anti-pattern: API keys in query strings get logged by every HTTP middleware on the path. The fix is Authorization: Bearer ... headers. The flag surfaces immediately rather than waiting for the credential to leak into a log aggregator.

Before / after: open redirect detection

urlParser({
  url: 'https://example.com/redirect?to=https%3A%2F%2Fevil.com%2Fphishing',
  decode: true,
  deepQueryParse: true,
});

// deepQuery: {
//   to: {
//     rawValue:     'https%3A%2F%2Fevil.com%2Fphishing',
//     decodedValue: 'https://evil.com/phishing',
//     type:         'url',
//     parsed:       { protocol: 'https:', host: 'evil.com', ... },
//   },
// }
// securityFlags: [
//   {
//     issue:    'Redirect URI points at a different host than the URL origin (example.com → evil.com). Verify this is intentional; open-redirect patterns enable phishing.',
//     severity: 'warning',
//     param:    'to',
//   },
// ]

The cross-host redirect surfaces as a warning. Not every cross-host redirect is malicious — OAuth callbacks legitimately redirect to a different host — but it's a flag worth surfacing for review.

Before / after: webhook URL with embedded JWT

urlParser({
  url: 'https://webhook.example.com/event?token=eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyIjoxLCJleHAiOjE3NDc5Mjk4OTJ9.signature_redacted',
  deepQueryParse: true,
});

// deepQuery: {
//   token: {
//     rawValue:     'eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyIjoxLCJleHAiOjE3NDc5Mjk4OTJ9.signature_redacted',
//     decodedValue: '<JWT token>',
//     type:         'jwt',
//     parsed: {
//       header:  { alg: 'HS256' },
//       payload: { user: 1, exp: 1747929892 },
//       signaturePresent: true,
//     },
//   },
// }
// securityFlags: [
//   {
//     issue:    'JWT detected in query string. Token expiry: 2026-05-22T15:24:52Z (4 days remaining). Like other credentials, JWTs in query parameters leak via logs and Referer headers.',
//     severity: 'high',
//     param:    'token',
//   },
// ]

The parser decodes the JWT — alg, payload claims, signature presence — and the security flag surfaces the leakage risk. Useful for incident response when investigating "did this JWT actually fire?" without manually base64-decoding the segments.

When humans use this

The dominant workflow is debugging an unexpected URL. An OAuth callback that's redirecting wrong, a webhook URL with an unfamiliar state parameter, a tracking URL that turned out to encode an entire JSON payload. Paste the URL, scan the deepQuery output for the layer that holds the actual answer.

When agents use this

Three patterns:

OAuth-flow debugging agent. An agent diagnosing why an OAuth integration fails calls the parser on the callback URL, reads the state and redirect_uri from the deepQuery output, and verifies them against the expected values. Most OAuth issues are visible at this layer.
Security-audit agent. A scheduled agent scans log files for URLs and runs each through the parser with deepQueryParse: true. URLs with securityFlags are aggregated into a daily report. Credentials in query strings, open-redirect patterns, and unusually long parameter values all surface.
Webhook-test harness. An agent generating test webhook deliveries constructs URLs with intentionally-structured state parameters and verifies the receiving handler parses them correctly via this tool.

Edge cases

Internationalised domain names (IDN)

URLs with non-ASCII characters in the host (https://日本.example) are decoded to Unicode in the host field. The Punycode-encoded form (https://xn--wgv71a.example) is preserved separately. Both are valid representations; the tool's output gives both so the consumer picks the right one.

Fragment-encoded auth tokens

OAuth 2.0 implicit flow places the access token in the URL fragment (#access_token=...). The tool parses the fragment as if it were a query string when decodeFragment: true (default false). Useful for capturing tokens delivered via the implicit flow when you control the client-side parsing.

Percent-encoding in the path

Path segments can contain percent-encoded characters (/products/screws%2Fbolts). The tool decodes them by default; pass decode: false to preserve the raw form. Some path schemes use percent-encoding as a meaningful separator (the %2F was meant to be a literal slash, not a path separator).

Maximum length

URLs over 8192 characters are rejected by most servers and parsers. The tool surfaces a warning at 4096 (the practical limit) and returns INPUT_TOO_LARGE at 65536 (the spec ceiling for some implementations).

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field	Type	Required	Default	Description
`url`	`string`	✓	—	The URL to parse
`decode`	`boolean`	✗	`true`	Percent-decode path segments and query values
`deepQueryParse`	`boolean`	✗	`false`	Recursively decode query parameter values
`decodeFragment`	`boolean`	✗	`false`	Parse the URL fragment as a query string (OAuth implicit flow)

Output shape

{
  valid:         boolean;
  protocol:      string;            // 'https:'
  host:          string;
  port:          string | null;
  path:          string;
  pathSegments:  string[];
  query:         Record<string, string>;  // flat decoded params
  hash:          string;
  deepQuery?:    Record<string, {          // when deepQueryParse: true
    rawValue:     string;
    decodedValue: string;
    type:         'plain' | 'base64' | 'jwt' | 'url' | 'json';
    parsed?:      object;  // structured parse when type is jwt / url / json / base64-of-json
  }>;
  securityFlags: Array<{
    issue:    string;
    severity: 'critical' | 'high' | 'warning' | 'info';
    param?:   string;
  }>;
}

Deep-parse type detection heuristics

Type	Trigger
`jwt`	Three base64url segments separated by dots, decodes to JSON object in segment 1
`base64`	All base64url chars, decodes to readable text or JSON
`url`	Decoded value parses as a URL (starts with `http://` or `https://`)
`json`	Decoded value parses as JSON (starts with `{` or `[`)
`plain`	None of the above

Common security flags

Severity	Trigger
`critical`	Known credential patterns in query (`sk_live_`, `AKIA*`, `ghp_`, etc.)
`critical`	URL contains basic auth (`https://user:pass@host/...`)
`high`	JWT in query parameter (token leakage via logs / Referer)
`high`	API key or Bearer token in query parameter
`warning`	Open-redirect pattern (cross-host redirect_uri)
`warning`	Parameter value over 2KB (potential DoS or abuse)
`info`	Hash fragment contains tokens (implicit-flow tokens leak via Referer)

Error codes

Code	When it fires	Recovery
`INPUT_EMPTY`	`url` empty	Provide a URL
`INPUT_MALFORMED`	URL parse failed (no scheme, invalid host)	Verify the URL is well-formed (try in a browser)
`INPUT_TOO_LARGE`	URL exceeds 65536 chars	Truncate or use a URL shortener (e.g. `link_obfuscator`)
`UNSUPPORTED_FORMAT`	URL uses a scheme this tool doesn't parse (`mailto:`, `tel:`, `magnet:`)	Use a scheme-specific parser

When NOT to use this tool

For URL building (taking parameters and producing a URL), use the runtime's URL constructor (new URL() in Node / browsers). The parser is for the reverse direction.

For URL canonicalisation (resolving relative URLs, removing duplicate slashes, sorting query parameters), use a canonicalisation library. The parser preserves the input form; it doesn't normalise.

For very high-throughput URL parsing (millions per second, e.g. log processing), inline the URL parsing in your processor with a streaming-friendly library. This tool's HTTP overhead dominates for short URLs.

Performance notes

Typical execution: under 3ms for URLs under 1KB. Deep query parsing adds 5-20ms depending on parameter count and nesting depth. The tool is deterministic — same input + same parameters always produce byte-identical output — so REST responses are Edge-Cache eligible.

The deep-query heuristics are tuned for common patterns (OAuth state, webhook tokens, redirect URIs). Domain-specific encoded values (a proprietary serialisation that looks base64-shaped but isn't) may be misclassified; the type field surfaces the heuristic decision so the consumer can override.