obfus.link
Converters

XML to JSON with attribute, namespace, and CDATA preservation

Convert XML to JSON while preserving attributes, namespaces, and CDATA sections as structured metadata — or use simple mode for lossy one-way consumption. Reverse direction (JSON to XML) for round-trip workflows. XXE-safe by default.

The XML to JSON converter preserves attributes (as @attr keys), namespaces (as xmlns: declarations), and CDATA sections (as #cdata-marked values) by default. Pass preserveAttributes: false for lossy simple-mode output. Reverse mode converts JSON to XML with a configurable root element. XXE-safe — rejects DTDs and external entities by default.

1. Insight

Insight

The problem this article addresses and why it matters.

XML is still everywhere it always was

The peak of "XML for everything" was 2008. The web moved on to JSON in 2010. Most JavaScript developers entering the industry now haven't written XML deliberately — but they've definitely consumed it, because XML never left the systems it landed in. SOAP APIs at financial institutions, RSS feeds from media systems, SAML assertions from enterprise SSO, configuration files in JVM and .NET ecosystems, EDI / HL7 / FpML payloads from any healthcare or finance integration — all XML, all still in production.

The translation problem is the asymmetry. JSON is mostly a subset of what XML can express. XML has attributes, namespaces, mixed content (text and child elements interleaved), comments, processing instructions, CDATA sections. JSON has none of those. Most xml-to-json converters discard the asymmetry: they read the elements, drop the attributes and namespaces, and emit a JSON-flavoured copy that can't round-trip back to valid XML.

Why a preservation-first converter

The tool in this article preserves everything by default. Attributes become metadata fields (@attributes). Namespaces are kept as prefix declarations the output JSON can carry forward. CDATA sections are marked explicitly so a downstream consumer knows the content was wrapped. The output is faithful enough to round-trip back to valid XML via the json-to-xml reverse mode.

For consumers that want the lossy simple conversion (XML → JSON with attributes dropped because the consumer doesn't care), pass preserveAttributes: false. The tool's default is the conservative choice; the opt-in lossy mode is for cases where the consumer has positive reason to ignore the metadata.

What this article delivers

End-to-end walks of converting a SOAP envelope, a SAML assertion, and a CDATA-heavy XML feed. We cover the reverse direction (JSON to XML with a configurable root element), the namespace-preservation behaviour, and the cases where neither direction is right because the XML uses features without a clean JSON equivalent (recursive schemas, document-type declarations, external entities).

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

  • Convert XML to JSON with attribute, namespace, and CDATA preservation by default
  • Convert JSON back to XML in reverse mode with a configurable root element
  • Choose between compact and verbose JSON representations of attribute-heavy XML
  • Recognise the XML features (DTDs, external entities, processing instructions) that the converter handles vs reports as warnings
  • Choose preserveAttributes: false when the consumer doesn't care about XML metadata and a simpler JSON output is preferred

The Examples section walks through SOAP, SAML, and CDATA-bearing XML in both directions.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: a SOAP envelope

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Header>
    <auth:Credentials xmlns:auth="http://example.com/auth">
      <auth:Token>abc123</auth:Token>
    </auth:Credentials>
  </soap:Header>
  <soap:Body>
    <ns:GetOrder xmlns:ns="http://example.com/orders">
      <ns:OrderId>4521</ns:OrderId>
    </ns:GetOrder>
  </soap:Body>
</soap:Envelope>
xmlToJson({
  input:               soapXml,
  mode:                'xml-to-json',
  preserveAttributes:  true,
  preserveNamespaces:  true,
  preserveCDATA:       false,
});

// output: {
//   "soap:Envelope": {
//     "@xmlns:soap":   "http://schemas.xmlsoap.org/soap/envelope/",
//     "soap:Header": {
//       "auth:Credentials": {
//         "@xmlns:auth":  "http://example.com/auth",
//         "auth:Token":   "abc123"
//       }
//     },
//     "soap:Body": {
//       "ns:GetOrder": {
//         "@xmlns:ns":   "http://example.com/orders",
//         "ns:OrderId":  "4521"
//       }
//     }
//   }
// }
// stats: { elements: 6, attributes: 3, namespaces: ['soap', 'auth', 'ns'], cdataSections: 0 }

The @ prefix on attribute keys and the namespace preservation make the JSON faithful to the original. A downstream consumer that needs to know which namespace Token belonged to can read auth:Token; a consumer that doesn't care can ignore the prefix.

Before / after: simple mode (lossy)

Same input, simpler JSON when you don't need the metadata:

xmlToJson({
  input: soapXml,
  preserveAttributes: false,
  preserveNamespaces: false,
});

// output: {
//   Envelope: {
//     Header: { Credentials: { Token: 'abc123' } },
//     Body:   { GetOrder:    { OrderId: '4521' } }
//   }
// }

Cleaner. Round-tripping back to valid SOAP would lose the namespace prefixes; for one-way consumption (e.g. extracting OrderId for downstream processing), this is the simpler shape.

Before / after: reverse direction (JSON to XML)

xmlToJson({
  input:        JSON.stringify({ user: { id: 42, name: 'Alice', email: 'alice@example.com' } }),
  mode:         'json-to-xml',
  rootElement:  'response',
});

// output: '<?xml version="1.0" encoding="UTF-8"?>\n<response><user><id>42</id><name>Alice</name><email>alice@example.com</email></user></response>'

The rootElement parameter is the top-level wrapper for the emitted XML. Useful for systems that expect a specific document root.

Before / after: CDATA preservation

<article>
  <title>The HTML5 spec</title>
  <body><![CDATA[<p>Some <strong>HTML</strong> content with <code>&lt;markup&gt;</code></p>]]></body>
</article>
xmlToJson({
  input:         xml,
  preserveCDATA: true,
});

// output: {
//   article: {
//     title: 'The HTML5 spec',
//     body:  { '#cdata': '<p>Some <strong>HTML</strong> content with <code>&lt;markup&gt;</code></p>' }
//   }
// }
// stats: { ..., cdataSections: 1 }

The #cdata key tells the consumer the value was wrapped in CDATA — meaningful because CDATA content is not entity-decoded by the parser. Without preservation, the value would just be a string and the consumer wouldn't know to skip entity decoding on round-trip.

When humans use this

A developer integrating with a SOAP API runs sample requests through the converter to get a JSON-shaped view they can reason about. A team migrating from XML configuration to JSON configuration runs the existing XML files through the converter to bootstrap the new JSON equivalents (then iterates). The reverse direction (JSON to XML) is less common but shows up when integrating with a system that only accepts XML.

When agents use this

Two patterns:

  • Legacy API ingestion. An agent integrating with a SOAP or RSS feed converts the response to JSON via the tool, then operates on the JSON downstream. The agent doesn't have to understand XML semantics; the converter handles the impedance mismatch.
  • Document-format normalisation. A pipeline that ingests heterogeneous documents (some XML, some JSON, some YAML) routes XML through this tool, YAML through yaml_to_env or a similar converter, and ends up with a single JSON representation downstream consumers can process uniformly.

Edge cases

DTDs and external entities

Document Type Declarations and external entity references (<!ENTITY xxx SYSTEM "...">) are a security concern (XML External Entity attacks). The tool rejects DTDs and external entities by default with SECURITY_VIOLATION. Pass allowDtd: true to opt in for trusted inputs — useful when the XML source is a known-safe internal system.

Mixed content

XML allows mixed content: <p>Hello <b>world</b>!</p> has both text ("Hello ", "!") and child elements (<b>world</b>). JSON has no idiomatic representation. The converter emits {"#text": "Hello ", "b": "world", "#text-after": "!"} to preserve order; this is the only translation that doesn't lose information but it's ugly. For text-dominant XML (DocBook, DITA), this is the failure mode of "structured" converters.

Numeric coercion

<count>42</count> becomes count: "42" (string) by default. Pass coerceNumbers: true to emit count: 42 (number). XML has no type information; numeric coercion is a heuristic that gets it right for unambiguous cases (42, -3.14) and wrong for ambiguous cases (007 is sometimes a number, sometimes a string like an employee ID). The default is "no coercion" because the false-positive rate of coercion at scale is non-trivial.

Comments and processing instructions

XML comments (<!-- ... -->) and processing instructions (<?xml-stylesheet ... ?>) are dropped by default. Pass preserveComments: true to keep them as #comment keys. Most consumers don't care, so the default is to drop.

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field

Type

Required

Default

Description

input

string

XML or JSON to convert

mode

'xml-to-json' | 'json-to-xml'

Direction

compact

boolean

false

Compact JSON representation for attribute-heavy XML

preserveAttributes

boolean

true

Keep XML attributes as @attribute keys

preserveNamespaces

boolean

true

Keep xmlns:* prefixes

preserveCDATA

boolean

true

Keep CDATA sections marked as #cdata

preserveComments

boolean

false

Keep XML comments as #comment keys

coerceNumbers

boolean

false

Coerce unambiguous numeric strings to numbers

allowDtd

boolean

false

Accept Document Type Declarations and external entities

rootElement

string

for json-to-xml

'root'

XML root element name

Output shape

{
  output: string;            // converted JSON or XML
  stats: {
    elements:      number;
    attributes:    number;
    namespaces:    string[]; // list of namespace prefixes found
    cdataSections: number;
  };
  warnings: string[];        // e.g. 'Namespace soap mapped to default'
}

Attribute encoding conventions

XML feature

JSON representation

Element with text content only

'key': 'value'

Element with attributes + text

'key': { '@attr': 'value', '#text': 'content' }

Element with attributes + children

'key': { '@attr': 'value', 'child': {...} }

CDATA section

'key': { '#cdata': 'content' }

Comment

'#comment': 'text' (when preserveComments)

Repeated element

'key': [...] (array)

Error codes

Code

When it fires

Recovery

INPUT_EMPTY

input empty

Provide a non-empty input

INPUT_MALFORMED

XML or JSON parse failed

Verify the input is well-formed

SECURITY_VIOLATION

DTD or external entity reference detected and allowDtd: false

Pass allowDtd: true for trusted inputs; refuse the input for untrusted sources (XXE attack risk)

INPUT_TOO_LARGE

Input exceeds 5MB

Streaming-XML parsers are the right tool for large inputs

UNSUPPORTED_FORMAT

XML feature without JSON equivalent (e.g. specific schema constructs)

Use a dedicated XML library for the feature in question

When NOT to use this tool

For XML schema validation, use a dedicated validator (xmllint, libxml2's validation mode). The converter handles well-formed XML; it doesn't validate against XSD schemas.

For very large XML feeds (multi-MB, multi-GB), use a streaming parser (sax-js, Python's lxml.iterparse). This tool loads the full document into memory.

For XML-to-XML transformations (XSLT use cases), use an XSLT processor. JSON is the wrong intermediate format for that workflow.

Performance notes

Typical execution: under 10ms for inputs under 50KB. Attribute preservation adds 5-15% overhead vs simple mode. The tool is deterministic — same input + same parameters always produce byte-identical output — so REST responses are Edge-Cache eligible.

The XML parser is XXE-safe by default (allowDtd: false). The opt-in path expects the caller to have verified the source. For high-throughput conversion (thousands of small documents per second), the per-call overhead of the tool's HTTP layer is meaningful; convert in-process with a library like fast-xml-parser instead.

Try it now

XML to JSON

Bidirectional XML ↔ JSON with attribute, namespace, and CDATA preservation

FAQ

Frequently asked questions

Why is XXE protection enabled by default?

XML External Entity attacks let an attacker exfiltrate local files or trigger SSRF via crafted XML. The default behaviour rejects DTDs and external entities; you opt in via allowDtd: true for trusted inputs. The default is the secure choice; opt-in is for cases where you've confirmed the input source.

Can I round-trip XML through JSON without losing data?

Yes if you keep preserveAttributes, preserveNamespaces, and preserveCDATA all true. The @attr keys, namespace declarations, and #cdata markers are sufficient for json-to-xml mode to reconstruct equivalent XML. Strip any of these and the round-trip becomes lossy.

How does it handle mixed content?

XML mixed content (`<p>Hello <b>world</b>!</p>`) becomes {"#text": "Hello ", "b": "world", "#text-after": "!"} to preserve order. It's ugly JSON but lossless. For text-dominant XML (DocBook, DITA), a structured converter is the wrong tool — use an XSLT processor that understands the document structure.

What's the largest XML I can convert?

5MB. Above that, streaming parsers are the right tool (sax-js for Node, lxml.iterparse for Python). The converter loads the full document into memory — fine for SOAP responses and config files, not fine for multi-GB feeds.