Extract URLs from Sitemap

Extract Every URL From Any XML Sitemap

"How do I extract all URLs from a sitemap?" If that's what brought you here, this tool does it in one click. Paste any website or direct sitemap URL and the extractor parses the XML, follows nested sitemap index files, and returns a clean, filterable list of every page URL — ready to copy or export as a .txt.

How to Extract URLs From a Sitemap

There are three reliable ways to pull URLs out of an XML sitemap — one fast, two manual.

1. Use the extractor above (fastest)

Paste the sitemap URL (e.g. example.com/sitemap.xml) or just the domain into the input at the top of this page and click Extract. The tool auto-detects the sitemap location if you paste a bare domain, follows sitemap index files, and returns every <loc> URL in a searchable table. From there you can filter, copy all to clipboard, or export as a text file.

2. View-source and grep the <loc> tags

Works in any browser and handles small, single-file sitemaps well.

  1. Open the sitemap URL directly in your browser.
  2. Right-click the page and choose View Page Source (or press Ctrl+U / Cmd+Option+U).
  3. Use Ctrl+F / Cmd+F to search for <loc> — every match wraps a page URL.
<url>
  <loc>https://example.com/about</loc>
  <lastmod>2025-01-12</lastmod>
</url>

Copy the URLs between the <loc> tags into a spreadsheet. Tedious beyond a few dozen entries, and it won't work for sitemap index files (which nest sitemaps inside sitemaps).

3. Use curl and a regex one-liner

For developers who want URLs in a terminal:

curl -s https://example.com/sitemap.xml | grep -oE '<loc>[^<]+' | sed 's/<loc>//'

This prints one URL per line. It doesn't recurse into nested sitemap indexes — the tool above does that automatically.

What the Extractor Handles

  • Plain XML sitemaps — the standard sitemap.xml format used by most sites.
  • Sitemap index files — files that list other sitemaps (common on large sites like ecommerce stores or news sites). The tool walks the index and extracts URLs from every referenced child sitemap.
  • Gzipped sitemaps (.xml.gz) — decompressed automatically.
  • Bare domains — paste example.com and the extractor looks for the sitemap at /sitemap.xml and checks /robots.txt for a Sitemap: directive.

How to Find a Website's Sitemap URL

If you don't know where a site's sitemap lives, try these in order:

  1. Append /sitemap.xml to the domain — example.com/sitemap.xml. This is the convention most CMSs follow.
  2. Check /robots.txt — open example.com/robots.txt and look for a line starting with Sitemap:. It points to the real sitemap location, which can be non-standard (e.g. /sitemap_index.xml on WordPress + Yoast).
  3. Use the Sitemap Finder — if both fail, our sister tool scans common sitemap paths for you.

Why Extract URLs From a Sitemap

A sitemap is the site owner's declaration of every page that should be indexed, which makes it the fastest way to get a clean URL inventory for:

  • SEO audits — feed the list into Screaming Frog, Ahrefs, or a custom crawler to check status codes, titles, and meta descriptions at scale. Pair it with the Image Alt Checker to audit accessibility page by page.
  • Content migrations — map old URLs to new ones before switching CMS or domain.
  • Competitor research — pull a competitor's full page inventory to see what content they rank for.
  • Broken link sweeps — compare the sitemap against your live site to find orphaned or missing pages.

Not every URL on a site appears in its sitemap — webmasters sometimes exclude thin or paginated pages on purpose — so for exhaustive coverage, pair the sitemap extract with a full crawl.

Frequently Asked Questions