Skip to main content

XML Sitemaps for Modern SEO

Last updated: December 31, 2025

XML sitemaps are the primary communication channel between your website and search engine crawlers. This guide covers the complete sitemap protocol, including image/video/news extensions, Hreflang implementation, validation best practices, and how to structure sitemaps for sites with millions of URLs.

What is an XML sitemap and why does it matter?

An XML sitemap is a machine-readable file that tells search engines exactly which URLs on your site should be indexed. While search engines discover pages by following links, this “crawl-based discovery” is insufficient for the scale and velocity of modern content creation.

The XML Sitemap Protocol serves as the primary communication channel between your website’s database and search engine crawlers. It’s not just a compliance checkbox—it’s a tool for:

  • Crawl budget optimization — Direct crawlers to your most important pages
  • Content discovery — Surface pages that might be orphaned or hidden behind JavaScript
  • Freshness signals — Tell search engines when content was last updated
  • Internationalization — Declare language relationships between page variants

With AI-driven search experiences (Google’s AI Overviews, Bing Copilot), sitemaps have become even more critical. Accurate <lastmod> timestamps now directly influence how quickly AI systems ingest fresh content to minimize hallucinations and provide real-time answers.


How do I structure a valid sitemap file?

A valid sitemap must be a UTF-8 encoded XML document. The root element is <urlset>, which contains namespace declarations governing the entire file.

Basic sitemap header

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <!-- URL entries go here -->
</urlset>

Header with all extensions

To use images, video, news, or Hreflang features, declare their namespaces:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
        xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"
        xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
        xmlns:xhtml="http://www.w3.org/1999/xhtml">

Failing to declare a namespace while using its tags (e.g., using <image:image> without xmlns:image) causes validation errors.

Hard limits

ConstraintLimit
URLs per file50,000 maximum
File size50MB uncompressed
URL length2,048 characters

If your site exceeds these limits, use a Sitemap Index file to reference multiple child sitemaps.


Which sitemap tags does Google actually use?

Inside <urlset>, each URL is wrapped in a <url> element with four possible child tags.

<loc> (Required)

The absolute URL of the page. Must match the sitemap’s protocol (HTTPS sitemap = HTTPS URLs).

<url>
  <loc>https://www.example.com/page</loc>
</url>

Requirements:

  • Absolute URLs only (not relative paths)
  • Must be the canonical version
  • No redirects, 404s, or noindex pages

<lastmod> (Optional but critical)

The date the page was last meaningfully modified, in W3C Datetime format.

<lastmod>2025-12-31T15:30:00+00:00</lastmod>

Search engines verify this tag. If you claim a page was updated but the content hash is identical, they’ll stop trusting your <lastmod> signals entirely. Only update this when content actually changes.

<changefreq> (Ignored)

Suggests how often the page changes (daily, weekly, monthly, etc.). Google explicitly ignores this tag because webmasters historically set everything to “daily” regardless of reality.

<priority> (Ignored)

A value from 0.0 to 1.0 indicating relative importance. Google ignores this—their PageRank analysis provides a more accurate signal than self-declared importance.

Recommendation: Omit <changefreq> and <priority> to save file size.


How do I add images and videos to my sitemap?

Image sitemaps

Image sitemaps help Google discover images hidden by JavaScript, lazy loading, or carousels.

Required tags:

  • <image:image> — Container for a single image (up to 1,000 per URL)
  • <image:loc> — Direct URL to the image file
<url>
  <loc>https://www.example.com/product-page</loc>
  <image:image>
    <image:loc>https://cdn.example.com/images/product.jpg</image:loc>
  </image:image>
</url>

The <image:loc> can point to a CDN domain—this is common for modern architectures.

Deprecated tags (May 2022): Google removed support for <image:caption>, <image:geo_location>, <image:title>, and <image:license>. Google’s computer vision now extracts this information automatically from the page and image metadata.

Video sitemaps

Video sitemaps enable rich results (thumbnails, key moments, timestamps) in search results.

Required tags:

  • <video:thumbnail_loc> — Thumbnail image URL (minimum 160×90 pixels)
  • <video:title> — Video title
  • <video:description> — Description (max 2,048 characters)
  • <video:player_loc> OR <video:content_loc> — Player URL or direct media file URL

Optional tags:

  • <video:duration> — Length in seconds (1–28,800)
  • <video:publication_date> — When the video was published
  • <video:tag> — Keywords (up to 32 per video)
  • <video:rating> — Rating from 0.0 to 5.0

Deprecated tags: <video:category>, <video:gallery_loc>, <video:price>, <video:tvshow>


How do I create a Google News sitemap?

News sitemaps are specifically for articles published in the last 48 hours. They function as a “breaking news” feed for Google. After 48 hours, URLs should be removed from the News sitemap (but remain in your standard sitemap).

Required tags

<url>
  <loc>https://www.example.com/news/article</loc>
  <news:news>
    <news:publication>
      <news:name>The Daily Times</news:name>
      <news:language>en</news:language>
    </news:publication>
    <news:publication_date>2025-12-31T10:00:00+00:00</news:publication_date>
    <news:title>Breaking News Headline</news:title>
  </news:news>
</url>

Critical: The <news:name> must exactly match your name in Google News Publisher Center. “The Daily Times” ≠ “Daily Times”.


How do I implement Hreflang in sitemaps?

For multilingual or multi-region sites, Hreflang prevents duplicate content issues and ensures users see the correct language version.

Implementation rules

  1. Self-referencing — Every page must list itself as an alternate
  2. Bi-directional — If Page A links to Page B, Page B must link back to Page A
  3. X-default — Specify a fallback for users whose language isn’t matched
<url>
  <loc>https://www.example.com/english-page</loc>
  <xhtml:link rel="alternate" hreflang="en" href="https://www.example.com/english-page"/>
  <xhtml:link rel="alternate" hreflang="de" href="https://www.example.com/deutsch-page"/>
  <xhtml:link rel="alternate" hreflang="x-default" href="https://www.example.com/"/>
</url>

Implementing Hreflang in sitemaps (rather than HTML <link> tags) reduces page weight and centralizes language management.


How do I scale sitemaps for large websites?

Sitemap Index files

When you exceed 50,000 URLs or 50MB, use a Sitemap Index—a parent file that lists child sitemaps:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://www.example.com/sitemap-products.xml</loc>
    <lastmod>2025-10-01</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.example.com/sitemap-blog.xml</loc>
    <lastmod>2025-09-21</lastmod>
  </sitemap>
</sitemapindex>

Segmentation strategies

Don’t split sitemaps arbitrarily (sitemap1.xml, sitemap2.xml). Instead, segment by category to enable debugging in Google Search Console:

StrategyExample FilesBenefit
By page typesitemap-products.xml, sitemap-blog.xmlIdentify if indexation issues affect specific templates
By freshnesssitemap-news.xml, sitemap-archive-2024.xmlEnsure fresh content is crawled frequently
By mediasitemap-video.xmlTrack Rich Results performance separately

Mega-sites (millions of URLs)

A single Sitemap Index can reference up to 50,000 sitemaps. Since each sitemap holds 50,000 URLs, one Index file supports 2.5 billion URLs—sufficient for virtually any website.


What are the most common sitemap errors?

Entity escaping

XML requires special characters to be escaped. This is the #1 cause of sitemap failures.

CharacterEscape Code
&&amp;
'&apos;
"&quot;
>&gt;
<&lt;

Wrong: https://example.com/product?id=1&sort=asc Correct: https://example.com/product?id=1&amp;sort=asc

Namespace declaration errors

Using extension tags without defining the namespace in the header causes parsing failures. Always declare xmlns:image, xmlns:video, etc.

Mixed signals

Including URLs that are blocked by robots.txt or tagged with noindex creates conflicting signals. Your sitemap should only contain canonical, indexable, 200 OK URLs.

Compression and submission

  • Use gzip compression (.xml.gz) to reduce bandwidth
  • Submit sitemaps via robots.txt: Sitemap: https://example.com/sitemap.xml
  • Or submit through Google Search Console

Note: Google deprecated the sitemap ping endpoint (google.com/ping?sitemap=...) in June 2023. It now returns 404.


How do I implement sitemaps in Astro?

Astro provides an official @astrojs/sitemap integration that automatically generates sitemaps at build time.

Installation

npx astro add sitemap

Configuration

In astro.config.mjs, you must set your site URL:

import { defineConfig } from 'astro/config';
import sitemap from '@astrojs/sitemap';

export default defineConfig({
  site: 'https://www.example.com',
  integrations: [sitemap()],
});

Output files

The integration generates:

  • sitemap-index.xml — Links to all numbered sitemap files
  • sitemap-0.xml — Contains your page URLs

For extremely large sites, additional files (sitemap-1.xml, etc.) are created automatically.

Configuration options

sitemap({
  // Change output filename (default: sitemap-index.xml)
  filenameBase: 'sitemap',

  // Maximum entries per file (default: 45000)
  entryLimit: 10000,

  // Exclude unused namespaces for smaller files
  namespaces: {
    news: false,
    video: false,
  },

  // Transform entries before writing
  serialize(item) {
    // Add lastmod to all entries
    item.lastmod = new Date().toISOString();
    return item;
  },
})

Filtering pages

Use the filter option to exclude pages:

sitemap({
  filter: (page) => !page.includes('/private/'),
})

Sources

Looking for expert guidance? Schedule a free consult:

Book a Free Consultation