obfus.link
Analyzers

Deep URL parsing: recursive query decode for OAuth + webhook callbacks

Parse any URL with optional recursive decoding of query parameter values — base64, URL-encoded, JWT, nested JSON, and inner URLs all unpacked. Security flags surface credentials in query strings, open-redirect patterns, and other leakage risks.

The URL Parser parses any URL into protocol, host, path, query, and hash with optional recursive decoding of query parameter values. Detects base64, URL-encoded, JWT, JSON, and nested URL types in query params and decodes each automatically. Security flags surface credentials in query strings and open-redirect patterns.

1. Insight

Insight

The problem this article addresses and why it matters.

URLs are JSON in disguise

A URL like https://example.com/oauth/callback?state=eyJrIjoidXNlcl81NDIxIiwiZXhwIjoxNzQ3OTI5ODkyfQ&redirect_uri=https%3A%2F%2Fapp.example.com%2Fhome%3Ftab%3Ddashboard&code=auth_xyz789 is a serialised structure containing a base64-encoded JSON state token, a URL-encoded redirect URI that itself has a query parameter, and an opaque authorisation code. To understand what just happened in an OAuth flow, you need to decode every layer.

Standard URL parsing (new URL(href)) handles the surface: protocol, host, path, query parameters as a flat key-value map. It doesn't decode the contents of query parameter values. A debugging session that starts with "why isn't the redirect working?" ends with 20 minutes of manually base64-decoding the state parameter, URL-decoding the redirect_uri, then realising the inner redirect_uri has its own query parameter that needed a separate decode.

Why a deep-parsing tool

The tool in this article does the deep work automatically. Pass deepQueryParse: true and the tool recursively decodes every query parameter value, detecting whether each value is plain text, base64, URL-encoded, JWT, or a nested JSON / URL. The output gives you the surface-level parse (protocol, host, path) plus a deepQuery map where each parameter is annotated with its detected type and parsed contents.

It also runs security flags across the whole URL: credentials in the query string (a known anti-pattern that exposes secrets in log files), open-redirect patterns (redirect_uri pointing at a different host), excessively long parameter values (a DoS vector for parsers that don't bound their input).

What this article delivers

End-to-end walks of parsing an OAuth callback URL, a webhook delivery URL with embedded tracking parameters, and a URL containing credentials in the query string. We cover the security-flag output, the cases where deep parsing surfaces structure the developer didn't realise was there, and the cases where the deep parser is the wrong tool (a query parameter value that's structurally similar to JSON but is actually a serialised proprietary format).

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

  • Parse any URL into protocol, host, path segments, flat query parameters, and hash fragment
  • Use deep-query mode to recursively decode base64-encoded, URL-encoded, JWT, JSON-shaped, and nested-URL query parameter values
  • Read the security flags output that surfaces credentials in query strings, open-redirect patterns, and excessively long parameter values
  • Trace OAuth callbacks, webhook deliveries, and analytics-instrumented URLs back to their original semantic structure
  • Recognise the cases where the deep parser misinterprets a value (a base64-shaped opaque ID that isn't actually base64-encoded JSON)

The Examples section walks through an OAuth callback, a webhook URL with credentials, and a deeply-nested redirect chain.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: parsing an OAuth callback

urlParser({
  url:            'https://example.com/oauth/callback?state=eyJrIjoidXNlcl81NDIxIiwiZXhwIjoxNzQ3OTI5ODkyfQ&redirect_uri=https%3A%2F%2Fapp.example.com%2Fhome%3Ftab%3Ddashboard&code=auth_xyz789',
  decode:         true,
  deepQueryParse: true,
});

// valid:    true
// protocol: 'https:'
// host:     'example.com'
// port:     null
// path:     '/oauth/callback'
// pathSegments: ['oauth', 'callback']
// query: {
//   state:        'eyJrIjoidXNlcl81NDIxIiwiZXhwIjoxNzQ3OTI5ODkyfQ',
//   redirect_uri: 'https://app.example.com/home?tab=dashboard',
//   code:         'auth_xyz789',
// }
// hash: ''
// deepQuery: {
//   state: {
//     rawValue:     'eyJrIjoidXNlcl81NDIxIiwiZXhwIjoxNzQ3OTI5ODkyfQ',
//     decodedValue: '{"k":"user_5421","exp":1747929892}',
//     type:         'base64',
//     parsed:       { k: 'user_5421', exp: 1747929892 },
//   },
//   redirect_uri: {
//     rawValue:     'https%3A%2F%2Fapp.example.com%2Fhome%3Ftab%3Ddashboard',
//     decodedValue: 'https://app.example.com/home?tab=dashboard',
//     type:         'url',
//     parsed:       { protocol: 'https:', host: 'app.example.com', path: '/home', query: { tab: 'dashboard' } },
//   },
//   code: {
//     rawValue:     'auth_xyz789',
//     decodedValue: 'auth_xyz789',
//     type:         'plain',
//   },
// }
// securityFlags: []

The flat parse shows the surface URL. The deepQuery field is where the value is — the state parameter is base64-decoded and shown to be a JSON object with a user ID and expiry; the redirect_uri is URL-decoded and recursively parsed (it has its own query.tab parameter).

Debugging "why isn't the redirect working?" now takes 30 seconds: read the deep parse output and check if the inner redirect URL's host is what you expect.

Before / after: credentials in query string

urlParser({
  url: 'https://api.example.com/v1/users?api_key=sk_live_abc123def456&limit=100',
  decode: true,
});

// query: { api_key: 'sk_live_abc123def456', limit: '100' }
// securityFlags: [
//   {
//     issue:    'API key detected in query string. Query parameters appear in server access logs, browser history, and Referer headers — credentials should be in Authorization header instead.',
//     severity: 'critical',
//     param:    'api_key',
//   },
// ]

The security flag catches the anti-pattern: API keys in query strings get logged by every HTTP middleware on the path. The fix is Authorization: Bearer ... headers. The flag surfaces immediately rather than waiting for the credential to leak into a log aggregator.

Before / after: open redirect detection

urlParser({
  url: 'https://example.com/redirect?to=https%3A%2F%2Fevil.com%2Fphishing',
  decode: true,
  deepQueryParse: true,
});

// deepQuery: {
//   to: {
//     rawValue:     'https%3A%2F%2Fevil.com%2Fphishing',
//     decodedValue: 'https://evil.com/phishing',
//     type:         'url',
//     parsed:       { protocol: 'https:', host: 'evil.com', ... },
//   },
// }
// securityFlags: [
//   {
//     issue:    'Redirect URI points at a different host than the URL origin (example.com → evil.com). Verify this is intentional; open-redirect patterns enable phishing.',
//     severity: 'warning',
//     param:    'to',
//   },
// ]

The cross-host redirect surfaces as a warning. Not every cross-host redirect is malicious — OAuth callbacks legitimately redirect to a different host — but it's a flag worth surfacing for review.

Before / after: webhook URL with embedded JWT

urlParser({
  url: 'https://webhook.example.com/event?token=eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyIjoxLCJleHAiOjE3NDc5Mjk4OTJ9.signature_redacted',
  deepQueryParse: true,
});

// deepQuery: {
//   token: {
//     rawValue:     'eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyIjoxLCJleHAiOjE3NDc5Mjk4OTJ9.signature_redacted',
//     decodedValue: '<JWT token>',
//     type:         'jwt',
//     parsed: {
//       header:  { alg: 'HS256' },
//       payload: { user: 1, exp: 1747929892 },
//       signaturePresent: true,
//     },
//   },
// }
// securityFlags: [
//   {
//     issue:    'JWT detected in query string. Token expiry: 2026-05-22T15:24:52Z (4 days remaining). Like other credentials, JWTs in query parameters leak via logs and Referer headers.',
//     severity: 'high',
//     param:    'token',
//   },
// ]

The parser decodes the JWT — alg, payload claims, signature presence — and the security flag surfaces the leakage risk. Useful for incident response when investigating "did this JWT actually fire?" without manually base64-decoding the segments.

When humans use this

The dominant workflow is debugging an unexpected URL. An OAuth callback that's redirecting wrong, a webhook URL with an unfamiliar state parameter, a tracking URL that turned out to encode an entire JSON payload. Paste the URL, scan the deepQuery output for the layer that holds the actual answer.

When agents use this

Three patterns:

  • OAuth-flow debugging agent. An agent diagnosing why an OAuth integration fails calls the parser on the callback URL, reads the state and redirect_uri from the deepQuery output, and verifies them against the expected values. Most OAuth issues are visible at this layer.
  • Security-audit agent. A scheduled agent scans log files for URLs and runs each through the parser with deepQueryParse: true. URLs with securityFlags are aggregated into a daily report. Credentials in query strings, open-redirect patterns, and unusually long parameter values all surface.
  • Webhook-test harness. An agent generating test webhook deliveries constructs URLs with intentionally-structured state parameters and verifies the receiving handler parses them correctly via this tool.

Edge cases

Internationalised domain names (IDN)

URLs with non-ASCII characters in the host (https://日本.example) are decoded to Unicode in the host field. The Punycode-encoded form (https://xn--wgv71a.example) is preserved separately. Both are valid representations; the tool's output gives both so the consumer picks the right one.

Fragment-encoded auth tokens

OAuth 2.0 implicit flow places the access token in the URL fragment (#access_token=...). The tool parses the fragment as if it were a query string when decodeFragment: true (default false). Useful for capturing tokens delivered via the implicit flow when you control the client-side parsing.

Percent-encoding in the path

Path segments can contain percent-encoded characters (/products/screws%2Fbolts). The tool decodes them by default; pass decode: false to preserve the raw form. Some path schemes use percent-encoding as a meaningful separator (the %2F was meant to be a literal slash, not a path separator).

Maximum length

URLs over 8192 characters are rejected by most servers and parsers. The tool surfaces a warning at 4096 (the practical limit) and returns INPUT_TOO_LARGE at 65536 (the spec ceiling for some implementations).

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field

Type

Required

Default

Description

url

string

The URL to parse

decode

boolean

true

Percent-decode path segments and query values

deepQueryParse

boolean

false

Recursively decode query parameter values

decodeFragment

boolean

false

Parse the URL fragment as a query string (OAuth implicit flow)

Output shape

{
  valid:         boolean;
  protocol:      string;            // 'https:'
  host:          string;
  port:          string | null;
  path:          string;
  pathSegments:  string[];
  query:         Record<string, string>;  // flat decoded params
  hash:          string;
  deepQuery?:    Record<string, {          // when deepQueryParse: true
    rawValue:     string;
    decodedValue: string;
    type:         'plain' | 'base64' | 'jwt' | 'url' | 'json';
    parsed?:      object;  // structured parse when type is jwt / url / json / base64-of-json
  }>;
  securityFlags: Array<{
    issue:    string;
    severity: 'critical' | 'high' | 'warning' | 'info';
    param?:   string;
  }>;
}

Deep-parse type detection heuristics

Type

Trigger

jwt

Three base64url segments separated by dots, decodes to JSON object in segment 1

base64

All base64url chars, decodes to readable text or JSON

url

Decoded value parses as a URL (starts with http:// or https://)

json

Decoded value parses as JSON (starts with { or [)

plain

None of the above

Common security flags

Severity

Trigger

critical

Known credential patterns in query (sk_live_, AKIA*, ghp_, etc.)

critical

URL contains basic auth (https://user:pass@host/...)

high

JWT in query parameter (token leakage via logs / Referer)

high

API key or Bearer token in query parameter

warning

Open-redirect pattern (cross-host redirect_uri)

warning

Parameter value over 2KB (potential DoS or abuse)

info

Hash fragment contains tokens (implicit-flow tokens leak via Referer)

Error codes

Code

When it fires

Recovery

INPUT_EMPTY

url empty

Provide a URL

INPUT_MALFORMED

URL parse failed (no scheme, invalid host)

Verify the URL is well-formed (try in a browser)

INPUT_TOO_LARGE

URL exceeds 65536 chars

Truncate or use a URL shortener (e.g. link_obfuscator)

UNSUPPORTED_FORMAT

URL uses a scheme this tool doesn't parse (mailto:, tel:, magnet:)

Use a scheme-specific parser

When NOT to use this tool

For URL building (taking parameters and producing a URL), use the runtime's URL constructor (new URL() in Node / browsers). The parser is for the reverse direction.

For URL canonicalisation (resolving relative URLs, removing duplicate slashes, sorting query parameters), use a canonicalisation library. The parser preserves the input form; it doesn't normalise.

For very high-throughput URL parsing (millions per second, e.g. log processing), inline the URL parsing in your processor with a streaming-friendly library. This tool's HTTP overhead dominates for short URLs.

Performance notes

Typical execution: under 3ms for URLs under 1KB. Deep query parsing adds 5-20ms depending on parameter count and nesting depth. The tool is deterministic — same input + same parameters always produce byte-identical output — so REST responses are Edge-Cache eligible.

The deep-query heuristics are tuned for common patterns (OAuth state, webhook tokens, redirect URIs). Domain-specific encoded values (a proprietary serialisation that looks base64-shaped but isn't) may be misclassified; the type field surfaces the heuristic decision so the consumer can override.

Try it now

URL Parser

Parse any URL into components with deep query decoding and security analysis

FAQ

Frequently asked questions

What does the deep-query mode actually decode?

Each query parameter value is checked against type detectors: JWT (three base64url segments), base64 (printable encoding to readable JSON or text), URL (starts with http:// or https://), JSON (starts with { or [), plain (none of the above). The detected type drives recursive decoding — a URL-encoded redirect_uri gets its own query parameters parsed too.

Why does the tool flag JWTs in query strings?

Query parameters appear in server access logs, browser history, and Referer headers. A JWT in a query parameter is a credential being leaked through all three. The fix is to move the token to the Authorization header or a body parameter. The flag surfaces the leakage risk; the actual fix happens in the producing service.

How does it detect open-redirect patterns?

When deepQueryParse decodes a redirect_uri value to a URL, the tool checks if the target host differs from the URL's origin host. Cross-host redirects get flagged as warnings — not all are malicious (OAuth callbacks legitimately cross hosts) but they're worth surfacing for review.

Can it parse non-HTTP schemes like mailto: or magnet:?

No — the tool focuses on http:// and https:// URLs. Other schemes (mailto:, tel:, magnet:, ftp:) return UNSUPPORTED_FORMAT. Use a scheme-specific parser for those cases; the deep-query semantics don't apply.