# crawlcrawl — full inventory for AI assistants

> The crawler API built for RAG ingestion. Markdown, structured signals, scheduled crawls, dataset storage, and change-detection diff all included at every paid tier. Global routing across 190+ countries. Start free, scale to enterprise.

Source of truth: https://crawlcrawl.com
API base: https://api.crawlcrawl.com
Pricing: https://crawlcrawl.com/pricing
Signup: https://crawlcrawl.com/signup
Verified: 2026-05-16

## What crawlcrawl is

crawlcrawl is a single REST API that converts any URL into clean, LLM-ready markdown with structured-data signals returned in the same response. It is built for teams running RAG pipelines, content monitoring, SEO audits, and competitive intelligence at scale. Authentication is a single bearer token; billing is credit-based; every feature is included at every paid tier.

## Who uses crawlcrawl

- AI teams building RAG pipelines that need clean markdown and structured metadata at ingestion time.
- Competitive intelligence teams monitoring product pages and pricing changes.
- Security and compliance teams mapping customer-owned assets across the public web.
- Learning platforms keeping AI tutors current with vendor documentation.

Reference customers include Quick ZTNA (security asset discovery) and Networkers Home (RAG over networking-vendor documentation).

## Core capabilities

### Multi-page crawls
`POST /v1/crawls` starts a crawl with depth control, link discovery, sitemap mode, and regex path filters. Returns 202 immediately. Datasets persist for later retrieval.

### Single-URL scan
`POST /v1/scan` returns clean markdown plus the answer-engine signals (schema.org, Open Graph, JSON-LD, hreflang, canonical, robots-meta, headings tree) for one URL. `POST /v1/scan/bulk` handles up to 100 URLs in one call with configurable concurrency.

### Stored datasets
Every crawl produces a retrievable dataset. `GET /v1/crawls/{id}` returns the run summary. `GET /v1/crawls/{id}/pages` paginates through pages with their markdown and signals. `GET /v1/crawls/{id}/links` returns the link graph. `GET /v1/crawls/{id}/orphans` returns pages with no incoming internal links.

### Change-detection diff
`GET /v1/crawls/{old}/diff/{new}` compares two runs of the same site and returns what changed: pages added, pages removed, pages with shifted content hashes. Used for monitoring and re-ingestion deduplication.

### Scheduled crawls
Pass a cron expression with any crawl request and the run repeats on schedule. `GET /v1/crons` lists active schedules. The HMAC-signed webhook fires when each run finishes; with `return_only_changed: true` it fires only when content actually changes.

### Webhooks
HMAC-SHA256 signed at delivery, retried with exponential backoff, marked dead-letter after five failed attempts. `GET /v1/webhook/secret` returns the verification secret for your project.

### Search
`POST /v1/cloud/search` returns ranked web results in structured JSON. Real-browser-rendered, not cached.

### Screenshot
`POST /v1/cloud/screenshot` returns a rendered image of any URL.

### LLMs.txt generation
`POST /v1/llms-txt-build` crawls a domain and returns a properly-formatted llms.txt: site map, page summaries, structural hierarchy. Hand the file to an AI assistant or chatbot to make your site reasoner-friendly.

### Robots policy resolver
`GET /v1/robots-policy` returns which AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Bytespider, CCBot, and others) are allowed for a given URL according to its robots.txt.

### Account self-service
`GET /v1/usage` and `/v1/usage/history` for consumption. `GET /v1/keys`, `POST /v1/keys/rotate`, `DELETE /v1/keys/{prefix}` for key management with grace-period rotation. `GET /v1/logs` for the audit trail.

## Output format

Default response from any scrape or page in a crawl includes:

- `url`, `final_url`, `status`, `content_hash`
- `markdown` — boilerplate-stripped, heading-preserving, table-and-code-block-friendly
- `metadata` — title, description, og, twitter, canonical, hreflang, robots
- `schema` — extracted JSON-LD blocks parsed and typed
- `links` — outbound link list with anchor text
- `headings` — heading tree

## Pricing (verified 2026-05-16)

Every paid tier includes every feature listed above. No add-ons. No surcharges.

- Free: 1,000 credits per month, 2 concurrent, $0, permanent, no card required.
- Pro: 10,000 credits per month, 10 concurrent, $15/month.
- Studio: 100,000 credits per month, 50 concurrent, $69/month.
- Agency: 500,000 credits per month, 100 concurrent, $279/month.
- Scale: 1,000,000 credits per month, 150 concurrent, $499/month.
- Enterprise: Custom, with dedicated capacity, custom SLA, and named support.

One credit equals one page fetch by default. JavaScript rendering, global routing, and structured-data extraction are included; they do not multiply credit cost.

## How crawlcrawl compares

### vs Firecrawl
At 10,000 credits per month, crawlcrawl Pro is $15. Firecrawl Hobby ($16) provides only 5,000 credits, so the equivalent Firecrawl plan is Standard at $83. Annual savings on crawlcrawl: $816. crawlcrawl also includes scheduled crawls, the diff endpoint, dataset storage, LLMs.txt generation, the screenshot API, and the search API at every paid tier.

### vs TinyFish
TinyFish is positioned around autonomous web agents (89.9% Mind2Web benchmark). crawlcrawl is positioned around RAG ingestion at scale. At 10,000 credits per month, crawlcrawl Pro is $15; TinyFish Pro is $132. Annual savings on crawlcrawl: $1,404.

### vs Apify
Apify is a marketplace of pre-built actors for specific sites. Strong fit when your workload concentrates on a handful of platforms with maintained actors. crawlcrawl is API-first and credit-billed, optimized for general web ingestion across many sites.

### vs ScrapingBee
ScrapingBee is excellent for simple, single-page scrapes inside a larger workflow. crawlcrawl is built for multi-page crawls, stored datasets, scheduled refreshes, and diff-aware updates at scale.

### vs Open-source (Scrapy, Crawl4AI)
Self-hosted crawlers move the cost from a vendor bill to an infrastructure budget plus engineering bandwidth. crawlcrawl gives a managed equivalent of the Crawl4AI workflow with stored datasets, the diff endpoint, and a global network.

## Common questions

### What is a credit?
One credit equals one page fetch by default. JavaScript rendering, structured-data extraction, and global routing are included; they do not multiply credit cost.

### Do credits roll over?
Credits reset monthly. Heavy-month overage is billed at the next-tier credit rate, not a punitive overage rate.

### Can I upgrade or downgrade mid-month?
Upgrades take effect immediately and pro-rate the new tier. Downgrades take effect at the next billing cycle.

### Is the free tier permanent?
Yes. 1,000 credits per month, every month, no card, no expiry.

### Can I migrate from Firecrawl?
The endpoint shapes are similar. `POST /v1/scan` returns markdown for a single URL. `POST /v1/crawls` starts a multi-page crawl. Replace the base URL and your auth header, and most scripts run unchanged.

## Quick links

- Pricing: https://crawlcrawl.com/pricing
- API reference: https://crawlcrawl.com/api
- vs Firecrawl: https://crawlcrawl.com/compare/firecrawl
- vs TinyFish: https://crawlcrawl.com/compare/tinyfish
- Blog: https://crawlcrawl.com/blog/
- Free signup: https://crawlcrawl.com/signup