ACTOR · /v1/actors/sitemap-audit

Sitemap Audit.

Auto-discover sitemaps via robots.txt and the /sitemap.xml fallback, recurse sitemap-indexes one level deep, probe every URL, bucket into seven categories: ok, redirect, client_error, server_error, noindex, canonicalised_away, network_error.

Discover, then probe

The actor takes a site root (or an explicit sitemap URL) and discovers every sitemap via two parallel paths: a robots.txt read for Sitemap: directives, and a fallback HEAD on /sitemap.xml. Sitemap indexes are recursed one level deep automatically. The combined URL list is then probed in parallel, with a configurable concurrency limit.

# Preview first — sees what's in the sitemap, charges nothing
curl -X POST https://api.crawlcrawl.com/v1/actors/sitemap-audit \
  -H "Authorization: Bearer crk_..." \
  -d '{"url":"https://your-client.com","dry_run":true}'

# → 200 — no probing, no billing
{
  "data": {
    "sitemaps_found": [ "https://.../sitemap.xml", "https://.../sitemap-blog.xml" ],
    "total_urls_in_sitemap": 4382,
    "pages_billed": 0
  }
}

# Now run for real — 1 credit per URL probed
curl -X POST https://api.crawlcrawl.com/v1/actors/sitemap-audit \
  -d '{"url":"https://your-client.com","max_urls":5000}'

# → 200
{
  "data": {
    "sitemaps_found": [ ... ],
    "total_urls_in_sitemap": 4382,
    "total_urls_probed": 4382,
    "pages_billed": 4382,
    "summary": {
      "ok": 4189, "redirect": 124, "client_error": 38,
      "server_error": 3, "noindex": 17, "canonicalised_away": 9,
      "network_error": 2
    },
    "items": [ ... ]
  }
}

The seven buckets

Billing transparency

Sitemap audits can be expensive — every URL probed is 1 page-credit. To prevent surprise bills we:

When to use it

Weekly SEO report deliverable. The seven-bucket summary is exactly the table SEO clients want to see. Schedule it as a Monitor with a webhook to push the diff to your reporting tool.

Post-migration verification. After a CMS migration, the sitemap is one of the first things to lie. A sitemap-audit catches the "we shipped a sitemap with 3,000 URLs but only 1,700 actually resolve" failure mode in minutes.

Crawl-budget triage. If Google is hitting your canonicalised_away URLs, that's crawl budget wasted. Drop those from the sitemap.

Pricing

1 page-credit per probed URL. dry_run calls are free. The $42 Studio tier covers 50,000 page-credits/month — enough to audit a 5K-URL sitemap every Sunday for a year. See full pricing →

Where it fits

Ship the sitemap-health table your clients want.

Preview free with dry_run, then run for real at 1 credit per URL.

Get an API key — free