Base64 with auto-detect: identifying JSON, JWTs, PEM certs, and image data in opaque payloads — obfus.link

1. Insight

Insight

The problem this article addresses and why it matters.

Base64 strings are opaque until they aren't

You receive a base64-encoded string from an API response, a webhook payload, or a log entry. What's inside? Standard base64 decoders just give you bytes. The bytes might be JSON, might be a JWT, might be a PEM-encoded certificate, might be the contents of an image file. Without knowing what's inside, you can't know what to do next.

The tool in this article adds an identify mode on top of standard encode/decode. Paste any base64 string, get back: which variant it uses (standard, URL-safe, MIME), what the decoded content actually is (JSON, JWT, image, PEM certificate, protobuf, plain text), and a preview of the decoded content. If it's a JWT, the header and payload come back parsed.

This solves the daily "what is this opaque blob?" question that comes up in API debugging, log analysis, and forensic work.

What this article delivers

Three modes walked end-to-end: standard encode/decode against UTF-8 text, the identify mode against representative payloads (JSON, JWT, PEM, image), and the variant detection that handles standard vs URL-safe vs MIME base64 automatically.

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

Encode and decode UTF-8 strings to and from base64 in standard, URL-safe, or MIME variants
Use identify mode to determine which variant an opaque base64 string uses, what the decoded content is, and a preview of the content
Recognise JWT-shaped base64 inputs and parse the header / payload into structured objects
Distinguish JSON, JWT, image, PEM certificate, protobuf, and plain-text content automatically from decoded bytes
Handle the URL-safe variant (- / _ instead of + / /) without per-call manual translation

The Examples section walks through encode, decode, and identify against representative payloads.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: identify an opaque blob

You found this in a log file:

eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyIjoxLCJleHAiOjE3NDc5Mjk4OTJ9.signature_redacted

base64Codec({
  input: 'eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyIjoxLCJleHAiOjE3NDc5Mjk4OTJ9.signature_redacted',
  mode:  'identify',
});

// output:          '<JWT — decoded below>'
// variant:         'url-safe'
// contentType:     'jwt'
// contentPreview:  '{"alg":"HS256"}{"user":1,"exp":1747929892}<signature>'
// jwtParts: {
//   header:  { alg: 'HS256' },
//   payload: { user: 1, exp: 1747929892 },
//   signaturePresent: true,
// }

Three pieces of information in one call: the variant (URL-safe), the content type (JWT), and the structured parse. No manual base64-decode then JSON-parse then header-and-payload-extract — the tool collapses all three steps.

Before / after: standard encode/decode

base64Codec({
  input:    'Hello, world!',
  mode:     'encode',
  variant:  'standard',
});

// output:      'SGVsbG8sIHdvcmxkIQ=='
// variant:     'standard'
// inputBytes:  13
// outputBytes: 20

Same content, URL-safe variant:

base64Codec({
  input:    'Hello, world!',
  mode:     'encode',
  variant:  'url-safe',
});

// output: 'SGVsbG8sIHdvcmxkIQ'  // no padding, no '+' or '/'

URL-safe drops the padding = characters and uses - / _ instead of + / /. Used in JWT signatures, OAuth state tokens, and anywhere a base64 string lands in a URL.

Before / after: decoding without knowing the variant

base64Codec({
  input: 'SGVsbG8sIHdvcmxkIQ',  // could be either variant
  mode:  'decode',
});

// output:  'Hello, world!'
// variant: 'url-safe'  (auto-detected from the absence of '+', '/', '=')

The decoder auto-detects the variant from the input characters. Useful when consuming base64 from an unknown source.

Before / after: identifying decoded content types

Identify mode classifies the decoded bytes:

Decoded content	`contentType`
Starts with `{` or `[`, parses as JSON	`'json'`
Three base64 segments separated by `.`	`'jwt'`
`-----BEGIN CERTIFICATE-----` ...	`'pem-certificate'`
PNG, JPEG, GIF magic bytes	`'image/png'`, `'image/jpeg'`, `'image/gif'`
Looks like protobuf wire format	`'protobuf'`
Printable ASCII / UTF-8	`'plain-text'`
Everything else	`'binary'`

When the content is a PEM certificate, the preview includes the PEM headers so the consumer can identify the cert type. When it's an image, the preview shows the image dimensions and format.

When humans use this

The dominant use is the "what is this?" workflow on opaque payloads — pasting from a log, an API response, or a captured webhook. The JWT detection is the highest-frequency hit (most modern systems use JWT-shaped base64 for auth tokens).

When agents use this

Two patterns:

Payload classification. An agent receiving an opaque base64 blob from an upstream system runs identify mode first to determine what to do with the content. Branch on contentType: JSON gets parsed, JWT gets validated, image gets passed to a vision tool.
Log analysis. An agent processing log entries with embedded base64 strings classifies each and extracts the meaningful structure (JWT payload, JSON object). The same pipeline handles heterogeneous logs.

Edge cases

Padding mismatch

Standard base64 requires padding (=) to a 4-byte boundary; URL-safe omits padding. Decoders should accept both. The tool's decoder is lenient — accepts padded and unpadded inputs in either variant. Output encoding is canonical for the chosen variant.

Truncated input

Truncated base64 (the producer cut the string before the end of an encoded byte) returns INPUT_MALFORMED. The decoder doesn't attempt partial-byte recovery.

Non-base64 characters

Input with characters outside the base64 alphabet (whitespace within the string, line breaks from MIME variant) is handled by stripping whitespace before decoding (MIME variant explicitly allows wrapped lines). Other invalid characters trigger INPUT_MALFORMED.

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field	Type	Required	Default	Description
`input`	`string`	✓	—	The string to encode, decode, or identify
`mode`	`'encode' \| 'decode' \| 'identify'`	✓	—	Workflow selector
`variant`	`'standard' \| 'url-safe' \| 'mime'`	✗	auto-detect for decode	Base64 variant
`inputEncoding`	`'utf-8' \| 'ascii' \| 'binary'`	✗	`'utf-8'`	Encoding of the source string for encode mode

Output shape

{
  output:          string;
  variant:         'standard' | 'url-safe' | 'mime';
  inputBytes:      number;
  outputBytes:     number;
  contentType?:    string;     // identify mode
  contentPreview?: string;     // identify mode — first 200 chars
  jwtParts?: {                 // identify mode + JWT input
    header:  object;
    payload: object;
    signaturePresent: boolean;
  };
}

Variant differences

Variant	Padding	Special chars
standard	`=` to 4-byte boundary	`+` / `/`
url-safe	omitted	`-` / `_`
mime	`=` + 76-char line wrap	`+` / `/`

Identify mode content classification

Trigger	contentType
First byte is `{` or `[`, parses as JSON	`json`
Three base64 segments separated by `.` (header + payload + signature)	`jwt`
Starts with `-----BEGIN`	`pem-certificate`
Magic bytes `89 50 4E 47`	`image/png`
Magic bytes `FF D8 FF`	`image/jpeg`
Magic bytes `47 49 46 38`	`image/gif`
Looks like protobuf wire format	`protobuf`
Printable UTF-8 / ASCII	`plain-text`
None of the above	`binary`

Base64 vs hex vs base32: which encoding to use

The three common binary-to-text encodings differ in compactness, alphabet, and case sensitivity. The choice rarely matters for tiny payloads but compounds at scale and affects which channels the encoded data can safely travel through.

Property	Base64	Hex	Base32
Bits per character	6	4	5
Overhead vs raw bytes	+33%	+100%	+60%
Alphabet size	64	16	32
Case-sensitive	Yes (`A` ≠ `a`)	No (most parsers)	No
URL-safe by default	No (`+`, `/`)	Yes	Yes
Manual transcription	Hard (case + similar chars)	Easy	Easy
Typical use	API payloads, JWT, embeds	Hashes, color codes, MAC addresses	TOTP secrets, ULIDs

Use base64 when you want maximum compactness and the payload travels through systems that preserve case (HTTP headers, JSON bodies, XML). Use the URL-safe variant (- / _ instead of + / /) when the payload sits in a URL path or query parameter.

Use hex when the data is transcribed or read by humans (cryptographic hashes, debugging output, error codes) or when the parsing layer is case-insensitive (CSS colors, MAC addresses, IPv6 literals). The 2× size cost is irrelevant for short fingerprints and the human-readability is worth it.

Use base32 when you need URL-safety + case-insensitivity + better-than-hex compactness. TOTP shared secrets (RFC 6238) are the canonical example — they get printed in QR codes that may degrade and typed by humans into authenticator apps. ULID ids use Crockford-base32 for the same reasons.

For payloads larger than a few KB the size difference becomes meaningful: a 1MB blob is 1.33MB in base64 vs 2MB in hex. For payloads under 100 bytes the overhead is irrelevant — pick the alphabet that fits the channel.

Error codes

Code	When it fires	Recovery
`INPUT_EMPTY`	`input` empty	Provide a non-empty string
`INPUT_MALFORMED`	Decode mode and input is not valid base64	Verify the input is a base64 string
`INPUT_INVALID_TYPE`	`variant` value outside the supported set	Use one of the three documented variants

When NOT to use this tool

For binary file encoding (image → base64 for embedding), use image_to_base64 — it handles format conversion, resizing, and LLM-vision message blocks alongside the encoding. This tool is the generic base64 codec; the image-specific tool is the right surface for image workflows.

For non-base64 binary-to-text encodings (base32, base58, base85, hex), use dedicated encoders. Base64 is the most common but not the only option.

Performance notes

Typical execution: under 2ms for inputs under 100KB. Identify mode adds 1-3ms for the content-type detection. Deterministic — same input + same variant produce byte-identical output, so REST responses are Edge-Cache eligible.