1. Insight
Insight
The problem this article addresses and why it matters.
Encoding the right thing for the right context
Every developer has encountered the "double-encoded HTML" bug: &lt; rendered in a browser instead of <, because the same content was HTML-encoded twice as it moved through a pipeline. The root cause is almost always context confusion — a value that needed JSON escaping was HTML-encoded instead, or content that should have been URL-encoded got HTML entities applied. Each output context (HTML body, HTML attribute, JSON string, XML attribute, URL parameter) has different rules for which characters are unsafe and how to escape them.
OWASP's Cross-Site Scripting Prevention Cheat Sheet lists six distinct escaping contexts. Most teams pick one — HTML entity encoding via a library — and apply it everywhere. That breaks when the destination isn't actually HTML.
Why context-aware encoding
The tool in this article takes a contextAware: true flag and auto-detects the embedding context from the input shape, OR accepts a contextOverride parameter when the destination is known but doesn't match the input shape. Detected contexts: html-raw (standard HTML entities), html-in-json (JSON-string escaping), html-in-xml (XML attribute escaping), html-in-url (percent-encoding).
The output uses the right escape sequences for each context — a single call replaces the manual "which encoder do I need?" decision that produces double-encoding bugs at scale.
What this article delivers
End-to-end walkthroughs of encoding the same content for four different output contexts, the auto-detection behaviour against ambiguous inputs, and the cases where the tool can't decide on its own and the consumer needs contextOverride.
2. Intent
Intent
What you will be able to do after reading.
By the end of this article you will be able to:
- Encode HTML content for safe embedding in HTML bodies, JSON strings, XML attributes, or URL parameters
- Use auto-detection mode (
contextAware: true) when the destination context is implied by the input shape - Override the detected context with
contextOverridewhen the destination differs from the input - Reverse the direction (decode) any of the four context-specific encodings back to the original string
- Recognise the cases that produce double-encoding bugs and structure the pipeline to prevent them
The Examples section walks through each context against the same input.
3. Examples
Examples
Annotated code and worked scenarios.
Before / after: encoding for an HTML body
Plain content with characters HTML parses as markup:
Before:
<script>alert("xss")</script>After (html-raw):
htmlEncoder({
html: '<script>alert("xss")</script>',
mode: 'encode',
entities: 'named',
contextAware: false,
contextOverride: 'html-raw',
});
// encoded: '<script>alert("xss")</script>'
// detectedContext: 'html-raw'
// entityCount: 5The <, >, and " characters become HTML entities. The result is safe to insert as text content in an HTML document.
Before / after: encoding for embedding in JSON
The same input destined for a JSON string value:
htmlEncoder({
html: '<script>alert("xss")</script>',
mode: 'encode',
contextAware: false,
contextOverride: 'html-in-json',
});
// encoded: '\\u003cscript\\u003ealert(\\"xss\\")\\u003c/script\\u003e'JSON requires \" escaping and < / > are commonly escaped to < / > to prevent the result from breaking out of a <script> tag if the JSON is rendered inline in HTML. The encoder picks the JSON-safe form rather than the HTML-entity form.
Before / after: encoding for a URL parameter
htmlEncoder({
html: 'message=<b>hello</b>&user=admin',
mode: 'encode',
contextOverride: 'html-in-url',
});
// encoded: 'message%3D%3Cb%3Ehello%3C%2Fb%3E%26user%3Dadmin'The =, <, >, &, and / characters become percent-encoded for safe inclusion in a URL query parameter. Note this is full percent-encoding — not the partial form that leaves = intact (which would break the parameter parser).
Before / after: auto-detection
When contextAware: true is set, the tool detects the context from input markers:
htmlEncoder({
html: '"users": [<b>Alice</b>]', // looks like a JSON fragment
mode: 'encode',
contextAware: true,
});
// detectedContext: 'html-in-json' (detected from the JSON-property-shape preamble)
// encoded: ...Detection is best-effort — when the input is ambiguous (a fragment that could be HTML or JSON), the tool falls back to html-raw and surfaces a warning. For ambiguous cases, set contextOverride explicitly.
Before / after: decoding (reverse)
htmlEncoder({
html: '<script>alert("xss")</script>',
mode: 'decode',
});
// decoded: '<script>alert("xss")</script>'Decoding handles named entities (<, &, "), numeric entities (<), and hex entities (<) interchangeably. Useful for reading content stored in HTML-encoded form and processing it as plain text.
When humans use this
A developer integrating user-generated content into a templated email runs the content through html-in-json encoding before embedding in the JSON template payload. A team building an inline-JSON-in-HTML pattern (data attributes, <script type="application/json"> blocks) uses the JSON context to produce content safe in both embedding contexts simultaneously.
When agents use this
Two patterns:
- Template-rendering agent. An agent generating templated content (email, HTML page, JSON config) calls the encoder with the right context for each template's embedding rule. Eliminates the "which escaper does this template need?" guess by the LLM.
- XSS-defence pipeline. A pipeline ingesting user content into a multi-context destination runs the encoder per context: once for HTML body display, once for JSON API output, once for URL parameter inclusion. The same input becomes three different safe representations.
Edge cases
Already-encoded content
Passing already-encoded content (&lt;) through encode mode encodes it again (&amp;lt;). Either decode first, or use the tool's detectAlreadyEncoded: true parameter to skip re-encoding when the input is already in the chosen context's form.
Unicode and surrogates
The encoder handles the BMP correctly. Characters outside the BMP (emoji, some CJK extensions) become surrogate pairs in JavaScript strings; the tool encodes them as proper surrogate-pair representations in JSON and as full code-point references in HTML (😀).
Round-trip stability
Encode then decode produces the original string for all four contexts. Decode then encode is stable for HTML-raw; for the other contexts, the encoded output is canonical (the encoder picks one representation among the valid forms).
4. Documentation
Documentation
Reference signatures, edge cases, and lookup tables.
Input parameters
Field | Type | Required | Default | Description |
|---|---|---|---|---|
|
| ✓ | — | The string to encode or decode |
|
| ✓ | — | Direction |
|
| ✗ |
| Entity style for HTML output |
|
| ✗ |
| Auto-detect the embedding context |
|
| ✗ | — | Force a specific context |
|
| ✗ |
| Skip re-encoding when input is already in the target context |
Output shape
{
encoded: string; // when mode: 'encode'
decoded?: string; // when mode: 'decode'
detectedContext: 'html-raw' | 'html-in-json' | 'html-in-xml' | 'html-in-url';
entityCount: number; // count of replacements made
warnings: string[];
}Context-specific escape tables
Char | html-raw | html-in-json | html-in-xml | html-in-url |
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Error codes
Code | When it fires | Recovery |
|---|---|---|
|
| Provide a non-empty input |
|
| Use one of the four documented contexts |
When NOT to use this tool
For HTML sanitisation (removing potentially-dangerous tags while preserving safe markup), use a dedicated sanitiser (DOMPurify, sanitize-html). The encoder escapes everything to text; the sanitiser preserves whitelisted markup while removing dangerous constructs.
For binary content embedded in text contexts, use base64 (base64_codec tool) rather than HTML encoding. HTML entities are inefficient for binary; base64 is the right primitive.
Performance notes
Typical execution: under 2ms for inputs under 50KB. The encoder is single-pass; performance scales linearly with input size. Deterministic — same input + same context produce byte-identical output, so REST responses are Edge-Cache eligible.