Sitemap Validator
Verify your XML sitemap against the sitemap.org schema — catch format errors, missing fields, and best-practice violations
Press ⌘+Enter to analyze
What Does “Verify Sitemap” Actually Mean?
Verifying a sitemap means checking that the file complies with the sitemap.org schema and that the URLs inside it actually represent the structure of your site. The schema check catches malformed XML — unclosed tags, missing namespaces, invalid date formats, or non-standard fields. The structural check catches mistakes a schema validator cannot see: 404 URLs, 301 redirects pointing somewhere unexpected, orphaned URLs that no internal link points to, or duplicate entries.
This validator runs the schema-level check against the sitemap.org spec for urlset and sitemapindex roots. For URL-level health checks (status codes, redirect chains), use Google Search Console or Screaming Frog after this validator passes.
How Do You Know If Your Sitemap Is Valid?
A valid sitemap meets all of these criteria:
- Well-formed XML. Every tag opens and closes; the namespace declaration matches sitemap.org.
- Root element is one of two:
<urlset>for a regular sitemap, or<sitemapindex>for a sitemap of sitemaps. Anything else is invalid. - Every URL has a fully-qualified location.
<loc>must include the protocol and domain — relative URLs are rejected. - Optional fields use valid values.
<lastmod>in ISO 8601;<changefreq>in the allowed enum (always, hourly, daily, weekly, monthly, yearly, never);<priority>between 0.0 and 1.0. - Within size limits: at most 50,000 URLs and 50 MB uncompressed per file. Past those limits, split into multiple sitemaps and link them from a sitemap index.
What Are the Most Common Sitemap Validation Errors?
- Missing XML declaration. The first line must be
<?xml version="1.0" encoding="UTF-8"?>. Some CMS exports skip this; add it back. - Wrong namespace. The
<urlset>element must declarexmlns="http://www.sitemaps.org/schemas/sitemap/0.9". - Invalid lastmod format. Use full ISO 8601:
2026-01-15or2026-01-15T10:30:00+00:00. US-style dates (1/15/26) are rejected. - Invalid changefreq value. Only the enum values (always, hourly, daily, weekly, monthly, yearly, never) are allowed. “biweekly” or “quarterly” are not valid.
- Priority outside 0.0–1.0. Some tools emit priorities of 5 or 10; anything outside the spec range is invalid.
- Mixed protocols (http and https) in the same sitemap. Pick one — usually https — and stick with it.
Should Search Engines Be Able to Find Every URL on My Site?
Not necessarily. A sitemap should list URLs you want indexed — high-quality, canonical, public-facing pages. URLs you should keep out of the sitemap include:
- Pages with
noindexmeta directives (you're telling search engines to skip them anyway). - Pages blocked by robots.txt.
- URL parameter variants of canonical pages (search results, tracking-tagged URLs, sort orders).
- Thin or low-quality pages that would dilute your overall site quality score.
- Internal pages (admin, staging, legal-only access).
When you train a Hyperleap AI agent on your website, the agent reads your sitemap to discover what pages to learn from. A clean, validated sitemap means a clean knowledge base — your AI agent answers visitor questions from the right pages, not from internal admin pages or thin parameter variants. See how Hyperleap AI agents use your sitemap →
Need an AI chatbot for your website?
Hyperleap AI Agents answer customer questions, capture leads, and work 24/7.
Get Started Free