Per-Site Scrapers — Pre-Configured Scrapers for Popular Sites

When a general crawler is not enough

Most public sites work cleanly with the regular crawl or scrape endpoints. Some sites have enough structure and specific value that a purpose-built scraper for that exact domain delivers higher-quality output. The per-site scraper endpoint gives you those purpose-built scrapers without writing or maintaining them.

The pattern is familiar to anyone who has built a scraping pipeline: a domain-specific parser knows the page structure, extracts the right fields with names that match the domain's vocabulary, and handles the edge cases that generic parsers fumble. The cost of that quality is ongoing maintenance because site layouts change. The per-site scraper endpoint absorbs that maintenance into a managed service so your code does not need to track it.

How it works

POST to /v1/cloud/fetch/{domain}/{path} to run a pre-configured scraper for the target. The response is structured JSON with the domain's natural field names.

curl -X POST \
  "https://api.crawlcrawl.com/v1/cloud/fetch/example-site.com/products" \
  -H "Authorization: Bearer crk_..." \
  -H "Content-Type: application/json"

# returns structured fields the domain naturally exposes

The honest catalog today

We pass through the spider.cloud scraper directory. As of 2026-05-19 the live directory exposes 3 vetted scrapers (github.com, quotes.toscrape.com, httpbin.org) — the first is genuinely useful for trending-repo monitoring, the others are test fixtures. The catalog grows as upstream adds entries.

If you need a site-specific scraper not in the catalog — Amazon, LinkedIn, Indeed, Zillow, etc. — the right path today is the generic crawler with our structured-data + extract-article actors layered on top, or /v1/cloud/render + your own parser. We're upfront about this rather than promising sites we don't actually support.

Browse the live catalog

GET /v1/cloud/scrapers returns the current catalog of supported domains, the paths each one handles, and a confidence score for the parser's current quality.

curl https://api.crawlcrawl.com/v1/cloud/scrapers \
  -H "Authorization: Bearer crk_..."

# returns
{
  "scrapers": [
    {
      "domain": "example-site.com",
      "path_pattern": "/products",
      "description": "Product catalog with prices and ratings",
      "confidence_score": 0.95
    },
    ...
  ],
  "pagination": { "next": "/v1/cloud/scrapers?page=2" }
}

The catalog is curated and grows as we add support for high-demand domains. The confidence score reflects how reliably the parser extracts complete data; teams typically filter for scores above 0.8 in production pipelines.

When to use this vs the regular crawler

Use the per-site scraper when:

The target domain has a published scraper in the catalog
You want named fields specific to that domain rather than generic markdown
You want the parser maintained for you when the source site changes layouts

Use the regular crawler when:

The target domain is not in the catalog (most arbitrary sites are not)
You need link discovery and multi-page crawling
You want clean markdown rather than domain-specific structured fields

Many production pipelines use both: the regular crawler for general ingestion and per-site scrapers for the handful of domains where the catalog match delivers materially higher quality.

Request a new scraper

If you need a scraper for a site not in the catalog, email [email protected] with the domain and the paths you need. We prioritize additions based on customer demand and add new scrapers regularly.

Pricing

Per-site scraper requests count as credits at the standard rate. Included at every paid tier. See full pricing →

Per-site scrapers.