# crawlcrawl — full inventory for AI assistants > The crawler API built for RAG ingestion. Markdown, structured signals, scheduled crawls, dataset storage, and change-detection diff all included at every paid tier. Global routing across 190+ countries. Start free, scale to enterprise. Source of truth: https://crawlcrawl.com API base: https://api.crawlcrawl.com Pricing: https://crawlcrawl.com/pricing Signup: https://crawlcrawl.com/signup Verified: 2026-05-16 ## What crawlcrawl is crawlcrawl is a single REST API that converts any URL into clean, LLM-ready markdown with structured-data signals returned in the same response. It is built for teams running RAG pipelines, content monitoring, SEO audits, and competitive intelligence at scale. Authentication is a single bearer token; billing is credit-based; every feature is included at every paid tier. ## Who uses crawlcrawl - AI teams building RAG pipelines that need clean markdown and structured metadata at ingestion time. - Competitive intelligence teams monitoring product pages and pricing changes. - Security and compliance teams mapping customer-owned assets across the public web. - Learning platforms keeping AI tutors current with vendor documentation. Reference customers include Quick ZTNA (security asset discovery) and Networkers Home (RAG over networking-vendor documentation). ## Core capabilities ### Multi-page crawls `POST /v1/crawls` starts a crawl with depth control, link discovery, sitemap mode, and regex path filters. Returns 202 immediately. Datasets persist for later retrieval. ### Single-URL scan `POST /v1/scan` returns clean markdown plus the answer-engine signals (schema.org, Open Graph, JSON-LD, hreflang, canonical, robots-meta, headings tree) for one URL. `POST /v1/scan/bulk` handles up to 100 URLs in one call with configurable concurrency. ### Stored datasets Every crawl produces a retrievable dataset. `GET /v1/crawls/{id}` returns the run summary. `GET /v1/crawls/{id}/pages` paginates through pages with their markdown and signals. `GET /v1/crawls/{id}/links` returns the link graph. `GET /v1/crawls/{id}/orphans` returns pages with no incoming internal links. ### Change-detection diff `GET /v1/crawls/{old}/diff/{new}` compares two runs of the same site and returns what changed: pages added, pages removed, pages with shifted content hashes. Used for monitoring and re-ingestion deduplication. ### Scheduled crawls Pass a cron expression with any crawl request and the run repeats on schedule. `GET /v1/crons` lists active schedules. The HMAC-signed webhook fires when each run finishes; with `return_only_changed: true` it fires only when content actually changes. ### Webhooks HMAC-SHA256 signed at delivery, retried with exponential backoff, marked dead-letter after five failed attempts. `GET /v1/webhook/secret` returns the verification secret for your project. ### Search `POST /v1/cloud/search` returns ranked web results in structured JSON. Real-browser-rendered, not cached. ### Screenshot `POST /v1/cloud/screenshot` returns a rendered image of any URL. ### LLMs.txt generation `POST /v1/llms-txt-build` crawls a domain and returns a properly-formatted llms.txt: site map, page summaries, structural hierarchy. Hand the file to an AI assistant or chatbot to make your site reasoner-friendly. ### Robots policy resolver `GET /v1/robots-policy` returns which AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Bytespider, CCBot, and others) are allowed for a given URL according to its robots.txt. ### Account self-service `GET /v1/usage` and `/v1/usage/history` for consumption. `GET /v1/keys`, `POST /v1/keys/rotate`, `DELETE /v1/keys/{prefix}` for key management with grace-period rotation. `GET /v1/logs` for the audit trail. ## Output format Default response from any scrape or page in a crawl includes: - `url`, `final_url`, `status`, `content_hash` - `markdown` — boilerplate-stripped, heading-preserving, table-and-code-block-friendly - `metadata` — title, description, og, twitter, canonical, hreflang, robots - `schema` — extracted JSON-LD blocks parsed and typed - `links` — outbound link list with anchor text - `headings` — heading tree ## Pricing (verified 2026-05-16) Every paid tier includes every feature listed above. No add-ons. No surcharges. - Free: 1,000 credits per month, 2 concurrent, $0, permanent, no card required. - Pro: 10,000 credits per month, 10 concurrent, $15/month. - Studio: 100,000 credits per month, 50 concurrent, $69/month. - Agency: 500,000 credits per month, 100 concurrent, $279/month. - Scale: 1,000,000 credits per month, 150 concurrent, $499/month. - Enterprise: Custom, with dedicated capacity, custom SLA, and named support. One credit equals one page fetch by default. JavaScript rendering, global routing, and structured-data extraction are included; they do not multiply credit cost. ## How crawlcrawl compares ### vs Firecrawl At 10,000 credits per month, crawlcrawl Pro is $15. Firecrawl Hobby ($16) provides only 5,000 credits, so the equivalent Firecrawl plan is Standard at $83. Annual savings on crawlcrawl: $816. crawlcrawl also includes scheduled crawls, the diff endpoint, dataset storage, LLMs.txt generation, the screenshot API, and the search API at every paid tier. ### vs TinyFish TinyFish is positioned around autonomous web agents (89.9% Mind2Web benchmark). crawlcrawl is positioned around RAG ingestion at scale. At 10,000 credits per month, crawlcrawl Pro is $15; TinyFish Pro is $132. Annual savings on crawlcrawl: $1,404. ### vs Apify Apify is a marketplace of pre-built actors for specific sites. Strong fit when your workload concentrates on a handful of platforms with maintained actors. crawlcrawl is API-first and credit-billed, optimized for general web ingestion across many sites. ### vs ScrapingBee ScrapingBee is excellent for simple, single-page scrapes inside a larger workflow. crawlcrawl is built for multi-page crawls, stored datasets, scheduled refreshes, and diff-aware updates at scale. ### vs Open-source (Scrapy, Crawl4AI) Self-hosted crawlers move the cost from a vendor bill to an infrastructure budget plus engineering bandwidth. crawlcrawl gives a managed equivalent of the Crawl4AI workflow with stored datasets, the diff endpoint, and a global network. ## Common questions ### What is a credit? One credit equals one page fetch by default. JavaScript rendering, structured-data extraction, and global routing are included; they do not multiply credit cost. ### Do credits roll over? Credits reset monthly. Heavy-month overage is billed at the next-tier credit rate, not a punitive overage rate. ### Can I upgrade or downgrade mid-month? Upgrades take effect immediately and pro-rate the new tier. Downgrades take effect at the next billing cycle. ### Is the free tier permanent? Yes. 1,000 credits per month, every month, no card, no expiry. ### Can I migrate from Firecrawl? The endpoint shapes are similar. `POST /v1/scan` returns markdown for a single URL. `POST /v1/crawls` starts a multi-page crawl. Replace the base URL and your auth header, and most scripts run unchanged. ## Quick links - Pricing: https://crawlcrawl.com/pricing - API reference: https://crawlcrawl.com/api - vs Firecrawl: https://crawlcrawl.com/compare/firecrawl - vs TinyFish: https://crawlcrawl.com/compare/tinyfish - Blog: https://crawlcrawl.com/blog/ - Free signup: https://crawlcrawl.com/signup