# crawlcrawl

> The crawler API built for RAG ingestion. Markdown, structured signals, scheduled crawls, dataset storage, and change-detection diff all included at every paid tier. Start free at 1,000 credits per month, scale to enterprise. The Firecrawl alternative with more credits at every tier.

Base URL: https://api.crawlcrawl.com
Auth: `Authorization: Bearer crk_...`
Spec: https://crawlcrawl.com/openapi.json
Signup: https://crawlcrawl.com/signup (no card required)

## What crawlcrawl does

crawlcrawl is a single REST API that turns any URL into clean, LLM-ready markdown with structured-data signals (schema.org, Open Graph, JSON-LD, hreflang, canonical) in the same response. Scheduled crawls, change-detection diffs, dataset storage, and HMAC-signed webhooks are included at every paid tier. Global routing across 190+ countries.

## Products

- [Crawler](https://crawlcrawl.com/products/crawler): Multi-page crawl with link discovery and orphan detection. Returns markdown for every page. `POST /v1/crawls`.
- [Monitor](https://crawlcrawl.com/products/monitor): Cron-scheduled recurring crawls. HMAC-SHA256 signed webhooks fire only on real changes.
- [Scrape](https://crawlcrawl.com/products/scrape): Single-URL fetch with markdown and structured signals.
- [Search](https://crawlcrawl.com/products/search): Live web search returning structured results. `POST /v1/cloud/search`.
- [Transform](https://crawlcrawl.com/products/transform): HTML or PDF to clean markdown.
- [Render](https://crawlcrawl.com/products/render): Browser-rendered HTML for sites that need JavaScript execution.
- [Unblock](https://crawlcrawl.com/products/unblock): One API call bypasses Cloudflare, Akamai, PerimeterX, Datadome, reCAPTCHA. Drop-in for any existing scraper. `POST /v1/cloud/unblock`.
- [LLMs.txt builder](https://crawlcrawl.com/products/llms-txt): Generate /llms.txt for any site.
- [AI-bot audit](https://crawlcrawl.com/products/ai-bot-audit): Resolve which AI crawlers a site allows.
- [Per-site scrapers](https://crawlcrawl.com/products/per-site-scrapers): Pre-built configs for popular sites.
- [Proxy fetch](https://crawlcrawl.com/products/proxy-fetch): Bytes-billed HTML fetch via spider.cloud proxy. ~13x cheaper than chrome render. 4 pools, 199 countries. No JS execution. `POST /v1/cloud/proxy-fetch`.

## Actors

- [audit-onpage](https://crawlcrawl.com/products/audit-onpage): ~30 on-page SEO rules per call. Errors, warnings, info. `POST /v1/actors/audit-onpage`.
- [extract-article](https://crawlcrawl.com/products/extract-article): Trafilatura body extraction with author + date metadata. `POST /v1/actors/extract-article`.
- [check-links](https://crawlcrawl.com/products/check-links): Lychee-based broken-link validation with optional chrome or proxy retry. `POST /v1/actors/check-links`.
- [structured-data](https://crawlcrawl.com/products/structured-data): JSON-LD, Microdata, RDFa, OpenGraph, Dublin Core, Microformats in one response. `POST /v1/actors/structured-data`.
- [render-diff](https://crawlcrawl.com/products/render-diff): Static vs JS-rendered DOM diff. Returns ai_bot_blind_pct — the 2026 AEO metric. `POST /v1/actors/render-diff`.
- [internal-link-graph](https://crawlcrawl.com/products/internal-link-graph): PageRank + WCC + orphan detection on any existing crawl_id. `POST /v1/actors/internal-link-graph`.
- [sitemap-audit](https://crawlcrawl.com/products/sitemap-audit): 7-bucket sitemap health. Supports `dry_run` for free cost preview. `POST /v1/actors/sitemap-audit`.

## Changelog

- [Changelog](https://crawlcrawl.com/changelog): Every meaningful API or product change, dated.

## Documentation

- [Quickstart](https://crawlcrawl.com/docs/quickstart): Crawl your first URL in 5 minutes.
- [Authentication](https://crawlcrawl.com/docs/authentication): Bearer keys, rotation, audit log.
- [Webhooks](https://crawlcrawl.com/docs/webhooks): HMAC-SHA256 signed deliveries.
- [Errors](https://crawlcrawl.com/docs/errors): Status codes and error envelope.
- [Rate limits](https://crawlcrawl.com/docs/rate-limits): Per-tier caps and concurrency.
- [API reference](https://crawlcrawl.com/api): Full endpoint reference.

## Use cases

- [RAG ingestion](https://crawlcrawl.com/use-cases/rag-ingestion): Crawl docs sites to clean markdown for vector databases.
- [Competitor monitoring](https://crawlcrawl.com/use-cases/competitor-monitoring): Webhook on pricing-page changes.
- [SEO audit at scale](https://crawlcrawl.com/use-cases/seo-audit): Orphan pages, broken links, weekly site diff.

## Comparison

- [crawlcrawl vs Firecrawl](https://crawlcrawl.com/compare/firecrawl): Exactly 50% of Firecrawl's price at every tier. Same page allowances, same concurrency, half the bill.
- [crawlcrawl vs TinyFish](https://crawlcrawl.com/compare/tinyfish): The crawler for RAG ingestion, not an agent platform.
- [crawlcrawl vs Apify](https://crawlcrawl.com/compare/apify): API-first vs marketplace.
- [crawlcrawl vs ScrapingBee](https://crawlcrawl.com/compare/scrapingbee): Crawler vs single-page scrape API.
- [crawlcrawl vs Screaming Frog](https://crawlcrawl.com/compare/screaming-frog): Hosted vs desktop SEO tool.
- [crawlcrawl vs ScrapeGraphAI](https://crawlcrawl.com/compare/scrapegraphai): Same credits at every tier, more features included. Multi-page crawl + dataset storage where ScrapeGraphAI focuses on LLM-driven extraction.
- [crawlcrawl vs Olostep](https://crawlcrawl.com/compare/olostep): Workload-comparison economics + integrated platform breadth at every tier.

## Blog

- [10 Best Web Crawlers for LLM and RAG Pipelines in 2026](https://crawlcrawl.com/blog/top-10-web-crawlers-llm-rag.html): Ranked comparison of the 10 best crawlers for retrieval-augmented generation. Verified 2026-05-16.
- [robots.txt for AI Crawlers in 2026: The Complete Guide](https://crawlcrawl.com/blog/robots-txt-for-ai-crawlers.html): Complete reference for configuring robots.txt for GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and every other AI crawler worth knowing. Copy-paste templates. Verified 2026-05-16.
- [AEO vs SEO in 2026: What Changes When AI Becomes the Search Engine](https://crawlcrawl.com/blog/aeo-vs-seo.html): Answer engine optimization vs traditional SEO. How they differ, where they overlap, technical setup checklist, and practical moves for the quarter. Verified 2026-05-16.
- [10 Best Open-Source Web Crawlers in 2026](https://crawlcrawl.com/blog/open-source-web-crawlers.html): Ranked comparison of Scrapy, Crawl4AI, Katana, Colly, Playwright, Puppeteer, Apache Nutch, Heritrix, node-crawler, MechanicalSoup. When to self-host versus pick managed. Verified 2026-05-16.
- [10 Best Bright Data Alternatives in 2026](https://crawlcrawl.com/blog/bright-data-alternatives.html): Ranked comparison of Bright Data alternatives for crawling, scraping, and RAG ingestion. Includes Oxylabs, ScraperAPI, Apify, ZenRows, Scrapfly, Zyte, Crawlbase, NetNut, Decodo. Real-workload pricing math. Verified 2026-05-16.
- [10 Best ScrapingBee Alternatives in 2026](https://crawlcrawl.com/blog/scrapingbee-alternatives.html): Ranked comparison of ScrapingBee alternatives for single-page scraping, multi-page crawling, and RAG ingestion. Includes Scrape.do, ScraperAPI, ZenRows, Scrapfly, Firecrawl, Apify, Crawlbase, ScrapingDog, Zyte. Verified 2026-05-16.
- [Firecrawl Pricing in 2026: Every Tier Explained, Real Costs, and How to Pick One](https://crawlcrawl.com/blog/firecrawl-pricing.html): Tier-by-tier breakdown of Firecrawl's six 2026 pricing plans (Free, Hobby $16, Standard $83, Growth $333, Scale $599, Enterprise). Real-world workload math and how to pick a tier. Verified 2026-05-16.
- [The Firecrawl Alternative for Teams Scaling Past the Hobby Tier](https://crawlcrawl.com/blog/firecrawl-alternative.html): Half the price of Firecrawl at every tier. Verified 2026-05-19.

## Legal

- [Privacy](https://crawlcrawl.com/privacy)
- [Terms of service](https://crawlcrawl.com/terms)
- [Security](https://crawlcrawl.com/security)
- [Sub-processors](https://crawlcrawl.com/sub-processors)

## Pricing

Every paid tier includes every feature: JavaScript rendering, 190+ country routing, markdown output, structured-data extraction, scheduled crawls, HMAC-signed webhooks, dataset storage, change-detection diff, LLMs.txt generation, screenshot API, search API, key rotation, robots policy management. No add-ons. No surcharges.

- Free: 1,500 pages per month, 2 concurrent, $0, permanent, no card required. Basic crawl + scrape only (no cloud / anti-bot).
- Pro: 5,000 pages per month, 5 concurrent, $8/month. All features unlocked. (Half of Firecrawl Hobby $16.)
- Studio: 100,000 pages per month, 50 concurrent, $42/month. Most popular. (Half of Firecrawl Standard $83.)
- Agency: 500,000 pages per month, 100 concurrent, $167/month. (Half of Firecrawl Growth $333.)
- Scale: 1,000,000 pages per month, 150 concurrent, $300/month. (Half of Firecrawl Scale $599.)
- Enterprise: custom caps, custom SLA, dedicated capacity, named support.

Billing model: two integer counters tracked per project — pages (standard fetches) and cloud_pages (anti-bot edge network). No multipliers. Search bills per result returned (a 50-result query consumes 50 cloud pages, a 5-result query consumes 5). Intelligence endpoints (link graph, orphan detection, change-detection diff, AI-bot policy audit) are zero-cost on every paid tier. Every chargeable call records a row in an append-only ledger with an idempotency token; retries never double-bill.

The headline: crawlcrawl is exactly half of Firecrawl at every tier. Same page allowances, same concurrency, same features unlocked — 50% of the bill. Pro $8 vs Firecrawl Hobby $16 (5K pages). Studio $42 vs Standard $83 (100K). Agency $167 vs Growth $333 (500K). Scale $300 vs Scale $599 (1M). crawlcrawl bills cloud_pages as an integer counter with no multipliers; Firecrawl's anti-bot multiplier is +4 credits per scrape.