obfus.link

Sitemap Validator parses XML sitemaps against the sitemaps.org schema and returns structured errors and warnings with line numbers. The Crawl Budget Analyzer estimates crawl frequency per URL, flags priority dilution from too many 1.0 priorities, detects stale changefreq values contradicting recent lastmod dates, and warns when approaching Google's fifty thousand URL hard limit before deployment.

Tier 2validators✓ TDD Verified

Sitemap Validator

Validate XML sitemaps with a crawl budget analyzer for SEO health

Up to 50,000 URLs per Google's spec · validates structure, <loc>, <priority>, <changefreq>, <lastmod>

How to use

  1. Paste your sitemap.xmlDrop the XML content into the textarea. The validator accepts the full document including <?xml?> declaration.
  2. Toggle Crawl BudgetLeave the Crawl Budget Analyzer ON to get priority/changefreq distributions and recommendations. Turn OFF for raw validation only.
  3. Run and review errorsClick Validate. Errors block the sitemap from being valid; warnings are non-blocking quality issues. Each entry includes a line number pointing back to the source.
  4. Read recommendationsThe Crawl Budget Analyzer recommendations call out priority dilution (too many 1.0s), stale changefreq vs recent lastmod, and approaching Google's 50,000 URL limit.
  5. Fix and re-validateApply the suggested fixes to your sitemap generator, paste the new XML, and run again. The deterministic hash in the footer lets you verify the input changed.
Read technical article

MCP / API

Call sitemap_validator directly from any MCP-compatible agent:

// MCP TypeScript SDK
const result = await client.callTool({
  name: "sitemap_validator",
  arguments: {
    "xml": "..."
  }
});

// curl
curl -X POST https://obfus.link/mcp \
  -H "Authorization: Bearer <SPT>" \
  -H "Content-Type: application/json" \
  -d '{"method":"tools/call","params":{"name":"sitemap_validator","arguments":{"xml":"..."}}}'

Related tools

Robots.txt Generator
Generate robots.txt with SEO Impact Simulator and pre-deploy URL testing
URL Parser
Parse any URL into components with deep query decoding and security analysis
Header Inspector
OWASP-graded HTTP security headers scorecard with CORS issue detection

FAQ

What's the difference between an error and a warning?

Errors are spec violations that block the sitemap from being marked valid — missing <loc>, invalid priority range, invalid changefreq value, malformed lastmod date, or a root element that is not <urlset>. Warnings are non-blocking quality issues — overly long URLs that may be truncated by crawlers, empty <urlset> with no <url> entries, etc.

Does the validator handle sitemap index files (<sitemapindex>)?

No. This tool validates <urlset> sitemaps only. If you have a sitemap index that points at child sitemaps, validate each child sitemap separately. A sitemap-index validator may be added as a separate tool — file an issue if you need it.

Why does the Crawl Budget Analyzer flag priority dilution?

Priority is a relative signal — when most or all URLs are set to 1.0, crawlers can no longer distinguish which pages are most important. The recommended pattern is to reserve 1.0 for the single most important page (typically the homepage), use 0.8 for primary section pages, 0.5 for content pages, and lower values for deep or low-priority pages.

What is the 50,000 URL limit?

Google's sitemap protocol allows up to 50,000 URLs per sitemap (and up to 50MB uncompressed). Sitemaps that exceed this limit are silently truncated by Googlebot. The Crawl Budget Analyzer warns when you cross 40,000 URLs (soft limit) so you have time to split into multiple sitemaps referenced from a <sitemapindex> file before the hard limit causes crawl truncation.

How is "stale changefreq" detected?

A URL with changefreq "yearly" or "never" that also has a <lastmod> within the last seven days is flagged as a contradiction — the page changed recently but advertises infrequent updates, which confuses crawl scheduling. Either update changefreq to reflect actual update cadence or remove it entirely and let crawlers infer cadence from <lastmod>.

Can I use this tool via the MCP API?

Yes. The tool is registered on the obfus.link MCP server at https://obfus.link/mcp. Call it from any MCP-compatible agent with a Shared Payment Token. The MCP tool name matches the snake_case slug shown in the integration snippet.