Auto-discover sitemaps via robots.txt and the /sitemap.xml fallback, recurse sitemap-indexes one level deep, probe every URL, bucket into seven categories: ok, redirect, client_error, server_error, noindex, canonicalised_away, network_error.
The actor takes a site root (or an explicit sitemap URL) and discovers every sitemap via two parallel paths: a robots.txt read for Sitemap: directives, and a fallback HEAD on /sitemap.xml. Sitemap indexes are recursed one level deep automatically. The combined URL list is then probed in parallel, with a configurable concurrency limit.
# Preview first — sees what's in the sitemap, charges nothing
curl -X POST https://api.crawlcrawl.com/v1/actors/sitemap-audit \
-H "Authorization: Bearer crk_..." \
-d '{"url":"https://your-client.com","dry_run":true}'
# → 200 — no probing, no billing
{
"data": {
"sitemaps_found": [ "https://.../sitemap.xml", "https://.../sitemap-blog.xml" ],
"total_urls_in_sitemap": 4382,
"pages_billed": 0
}
}
# Now run for real — 1 credit per URL probed
curl -X POST https://api.crawlcrawl.com/v1/actors/sitemap-audit \
-d '{"url":"https://your-client.com","max_urls":5000}'
# → 200
{
"data": {
"sitemaps_found": [ ... ],
"total_urls_in_sitemap": 4382,
"total_urls_probed": 4382,
"pages_billed": 4382,
"summary": {
"ok": 4189, "redirect": 124, "client_error": 38,
"server_error": 3, "noindex": 17, "canonicalised_away": 9,
"network_error": 2
},
"items": [ ... ]
}
}
meta robots noindex or X-Robots-Tag: noindex. Why is it in the sitemap?Sitemap audits can be expensive — every URL probed is 1 page-credit. To prevent surprise bills we:
max_urls to 100. You opt in to bigger audits explicitly.dry_run: true — discover sitemaps and count URLs without probing or billing. Use this before deciding on a real audit.pages_billed in every response as an explicit billing signal.Weekly SEO report deliverable. The seven-bucket summary is exactly the table SEO clients want to see. Schedule it as a Monitor with a webhook to push the diff to your reporting tool.
Post-migration verification. After a CMS migration, the sitemap is one of the first things to lie. A sitemap-audit catches the "we shipped a sitemap with 3,000 URLs but only 1,700 actually resolve" failure mode in minutes.
Crawl-budget triage. If Google is hitting your canonicalised_away URLs, that's crawl budget wasted. Drop those from the sitemap.
1 page-credit per probed URL. dry_run calls are free. The $42 Studio tier covers 50,000 page-credits/month — enough to audit a 5K-URL sitemap every Sunday for a year. See full pricing →
Preview free with dry_run, then run for real at 1 credit per URL.
Get an API key — free