Sitemap Validator parses XML sitemaps against the sitemaps.org schema and returns structured errors and warnings with line numbers. The Crawl Budget Analyzer estimates crawl frequency per URL, flags priority dilution from too many 1.0 priorities, detects stale changefreq values contradicting recent lastmod dates, and warns when approaching Google's fifty thousand URL hard limit before deployment.
Sitemap Validator
Validate XML sitemaps with a crawl budget analyzer for SEO health
How to use
- Paste your sitemap.xml — Drop the XML content into the textarea. The validator accepts the full document including <?xml?> declaration.
- Toggle Crawl Budget — Leave the Crawl Budget Analyzer ON to get priority/changefreq distributions and recommendations. Turn OFF for raw validation only.
- Run and review errors — Click Validate. Errors block the sitemap from being valid; warnings are non-blocking quality issues. Each entry includes a line number pointing back to the source.
- Read recommendations — The Crawl Budget Analyzer recommendations call out priority dilution (too many 1.0s), stale changefreq vs recent lastmod, and approaching Google's 50,000 URL limit.
- Fix and re-validate — Apply the suggested fixes to your sitemap generator, paste the new XML, and run again. The deterministic hash in the footer lets you verify the input changed.
MCP / API
Call sitemap_validator directly from any MCP-compatible agent:
// MCP TypeScript SDK
const result = await client.callTool({
name: "sitemap_validator",
arguments: {
"xml": "..."
}
});
// curl
curl -X POST https://obfus.link/mcp \
-H "Authorization: Bearer <SPT>" \
-H "Content-Type: application/json" \
-d '{"method":"tools/call","params":{"name":"sitemap_validator","arguments":{"xml":"..."}}}'Related tools
FAQ
What's the difference between an error and a warning?
Errors are spec violations that block the sitemap from being marked valid — missing <loc>, invalid priority range, invalid changefreq value, malformed lastmod date, or a root element that is not <urlset>. Warnings are non-blocking quality issues — overly long URLs that may be truncated by crawlers, empty <urlset> with no <url> entries, etc.
Does the validator handle sitemap index files (<sitemapindex>)?
No. This tool validates <urlset> sitemaps only. If you have a sitemap index that points at child sitemaps, validate each child sitemap separately. A sitemap-index validator may be added as a separate tool — file an issue if you need it.
Why does the Crawl Budget Analyzer flag priority dilution?
Priority is a relative signal — when most or all URLs are set to 1.0, crawlers can no longer distinguish which pages are most important. The recommended pattern is to reserve 1.0 for the single most important page (typically the homepage), use 0.8 for primary section pages, 0.5 for content pages, and lower values for deep or low-priority pages.
What is the 50,000 URL limit?
Google's sitemap protocol allows up to 50,000 URLs per sitemap (and up to 50MB uncompressed). Sitemaps that exceed this limit are silently truncated by Googlebot. The Crawl Budget Analyzer warns when you cross 40,000 URLs (soft limit) so you have time to split into multiple sitemaps referenced from a <sitemapindex> file before the hard limit causes crawl truncation.
How is "stale changefreq" detected?
A URL with changefreq "yearly" or "never" that also has a <lastmod> within the last seven days is flagged as a contradiction — the page changed recently but advertises infrequent updates, which confuses crawl scheduling. Either update changefreq to reflect actual update cadence or remove it entirely and let crawlers infer cadence from <lastmod>.
Can I use this tool via the MCP API?
Yes. The tool is registered on the obfus.link MCP server at https://obfus.link/mcp. Call it from any MCP-compatible agent with a Shared Payment Token. The MCP tool name matches the snake_case slug shown in the integration snippet.