Headless WordPress sitemap and canonical: one source of truth, served from the front
Two of the seven SEO patterns for headless WordPress deserve their own article because they break first and break silently. The sitemap and the canonical URL are the two signals Google trusts most for “what is this site, and which URL is the real one”. A headless build that gets either wrong loses the rank it migrated to keep.
This article makes the pattern concrete. It assumes the architectural decision (Astro or Next.js per the decision matrix) is already made.
The pattern, in one paragraph
Generate the sitemap from the front-end framework, with URLs that match the actual public site. Render the canonical URL as <link rel="canonical"> in the HTML head, sourced from WordPress (Yoast or Rank Math) and emitted by the front. Disable or 301 the WordPress origin sitemap and origin canonical. One sitemap, one canonical per page, both rendered server-side.
Why two sitemaps is the default failure mode
WordPress 5.5 introduced /wp-sitemap.xml as a core feature. Every WordPress install since has it on by default. SEO plugins (Yoast, Rank Math) generate their own sitemaps that override or supplement the core one. A headless build that ignores this ends up with three sitemaps on the same hostname:
/wp-sitemap.xmlfrom WordPress core./sitemap_index.xmlfrom Yoast or Rank Math./sitemap.xmlfrom the front-end framework.
Search Console sees overlap, sometimes flags inconsistency, and the actual indexed URLs become a function of which sitemap Google reads first that day. The fix is mechanical:
- The front-end framework generates the canonical sitemap at one well-known path (we use
/sitemap-index.xmlbecause Cloudflare Pages serves it cleanly). - The WordPress origin sitemap is disabled (Yoast and Rank Math both have a toggle) or 301-redirected to the front-end sitemap.
- The WordPress core sitemap at
/wp-sitemap.xmlis also 301’d to the front-end equivalent.
After the cutover, only one sitemap responds 200 OK. The rest 301 or 404.
How the front-end sitemap is built
Two real options for an Astro or Next.js front:
Build-time generation. The front-end build pulls every published post, page, and term URL from the WordPress origin during the build, sorts them, and emits the XML. This works for sites with predictable publishing cadence (most sites). Cache invalidation is handled by triggering a rebuild on publish.
On-demand at the edge. A Cloudflare Worker route generates the sitemap on request, reading from a cached list of URLs that the WordPress origin pushes via webhook on publish. This works for sites with high publish frequency where rebuild latency would be a problem.
We default to build-time generation. The Worker pattern is reserved for sites publishing more than a few times per hour.
How the canonical URL is rendered
The canonical URL must be in the HTML head, in the initial server response, before any client-side script runs. The pattern:
<link rel="canonical" href="https://example.com/headless-wordpress-for-woocommerce/" />
Three rules.
One, render server-side. Astro renders this from the page frontmatter or from the layout. Next.js renders it from metadata (App Router) or from <Head> in getServerSideProps paths. The thing to avoid is updating the canonical URL in a client effect; generative engines and many AEO surfaces parse the initial HTML only.
Two, source from WordPress. Yoast and Rank Math both expose the canonical URL per post via REST. The front fetches it during build (or per request) and renders it in HTML. WordPress remains the source of truth.
Three, self-referential by default. Every URL declares itself as canonical unless there is an explicit reason to point elsewhere (paginated archives, parameterised filtered URLs, syndicated content). When pointing elsewhere, the destination canonical points back at itself.
Edge cases that bite
- Trailing slash inconsistency. WordPress permalinks usually end with
/. The front-end framework may default to no trailing slash. Pick one, redirect the other, and never let both exist. - HTTP vs HTTPS, www vs apex. Usually solved at the CDN, but the canonical URL must declare the chosen variant. We declare
https://apex; everything else 301s to it. - Filtered URLs (faceted catalogue search). These often produce thousands of thin URL variants. Their canonical points to the unfiltered base; they also have
noindexto keep them out of the sitemap. - Paginated archives. Page 2, page 3, etc. each canonical to themselves, with
rel="prev"andrel="next"for clarity. Some teams point the canonical to page 1; that loses unique pages from the index. We do not recommend it. - Translated content. Each language variant canonical to itself, with
<link rel="alternate" hreflang="...">for siblings. The hreflang map is self-referential and must agree across all language variants.
Validation before going live
Two checks we run on every headless WordPress build:
Sitemap diff. Generate the new sitemap, compare against the legacy WordPress sitemap by URL set. Anything missing from the new one is a content gap. Anything new is a regression suspect (often a draft or a private post leaking).
Canonical sample. For 50 high-traffic pages, request the URL on the new front and assert the canonical in HTML head matches the URL itself (or matches the expected target if intentionally cross-canonical). One mismatch is a bug; ten mismatches is a pattern that needs the front-end build re-checked.
Both checks run in CI. A new build that fails either one does not deploy.
Where this fits
Anchored to the SEO patterns for headless WordPress checklist. Pairs with the Headless WordPress service pillar and the Next.js vs Astro decision matrix for the broader build-time decisions.
