CHANGELOG

What we ship.

Every meaningful change to the public API or product. Chronological, dated, honest.

2026-05-19

feat Pricing reset: exactly 50% of Firecrawl at every tier

  • Re-anchored every paid tier to match Firecrawl's published volumes at exactly half the price. Verified live on firecrawl.dev/pricing on 2026-05-19.
  • Pro: $8/mo · 5,000 pages · 5 concurrent — half of Firecrawl Hobby ($16).
  • Studio · most popular: $42/mo · 100,000 pages · 50 concurrent — half of Firecrawl Standard ($83).
  • Agency: $167/mo · 500,000 pages · 100 concurrent — half of Firecrawl Growth ($333).
  • Scale: $300/mo · 1,000,000 pages · 150 concurrent — half of Firecrawl Scale ($599).
  • Enterprise: custom (SSO, SLA, custom region).
  • Free tier unchanged (1,500 pages/mo, 2 concurrent, $0, no card).
  • One marketing line: "Same page allowances. Same concurrency. Same features. Half the bill."
2026-05-19

feat Proxy mode in 3 features + new /v1/cloud/proxy-fetch

  • New: POST /v1/cloud/proxy-fetch — bytes-billed HTML scrape via spider.cloud proxy. 4 pools (datacenter, isp, residential, mobile), 199 countries. ~13× cheaper than /v1/cloud/scrape for plain HTML.
  • Wired: /v1/cloud/scrape gains force_backend: "proxy" for cheap fetches.
  • Wired: /v1/actors/check-links gains cloud_retry_mode: "proxy" — 30× cheaper, 12× faster retry path.
  • Wired: /v1/crawls accepts proxy_pool + proxy_country top-level. Server resolves credentials; customer never sees a vendor key.
  • Calibrated: per-pool cost estimates verified against live billing within ±5%.
2026-05-18

feat R2 sidecar HTML storage + sitemap-audit billing transparency

  • New: Crawled HTML is now mirrored to Cloudflare R2 in the background. PG stays canonical; R2 is additive. Failed uploads leave columns NULL with a backfill binary for cleanup.
  • Changed: sitemap-audit default max_urls lowered 1000 → 100 (was a silent billing landmine — 200 URLs probed = 200 page-credits per call).
  • New: sitemap-audit dry_run: true — discovers sitemaps + counts URLs without probing or billing.
  • New: sitemap-audit response gains pages_billed as an explicit billing signal.
  • Fix: /v1/cloud/balance USD display was off by 10× (showed $1.55 instead of $15.46). Per-call cost values were already correct.
2026-05-18

fix Multi-process safety + worker_id retention

  • cron scheduler and stripe drainer now claim work atomically via FOR UPDATE SKIP LOCKED. Safe to run on 3+ processes against one Postgres primary.
  • crawl_runs.worker_id retained on completion so "which worker handled this run" is answerable post-mortem.
2026-05-17

feat Three new SEO actors

  • New: /v1/actors/render-diff — AI-bot blind %. Compares static HTML to JS-rendered DOM. The 2026 AEO metric.
  • New: /v1/actors/internal-link-graph — PageRank + WCC + orphan detection on any existing crawl_id. 1 flat credit per call.
  • New: /v1/actors/sitemap-audit — 7-bucket sitemap health (ok / redirect / 4xx / 5xx / noindex / canonicalised away / network error).
2026-05-16

feat Cloud retry on check-links + CF email-protection filter

  • cloud_retry: true reclassifies broken links via chrome render. Rescues LinkedIn 999, Cloudflare anti-bot, paywalled news.
  • Cloudflare email-protection URLs filtered out automatically (they decode to mailto: in the browser).
2026-05-15

feat Structured data extraction

  • New: /v1/actors/structured-data — JSON-LD, Microdata, RDFa, OpenGraph, Dublin Core, Microformats in one call. Backed by extruct.
2026-05-14

feat Actors lineup (audit-onpage, check-links, extract-article)

  • New: /v1/actors/audit-onpage — ~30 on-page SEO rules per call.
  • New: /v1/actors/check-links — Lychee-based broken-link check, up to 200 URLs per call.
  • New: /v1/actors/extract-article — Trafilatura body extraction with author + date metadata.
2026-05-14

fix Cloud API hardening (3 security fixes)

  • SSRF coverage: IPv6 bracket-form and v4-mapped-v6 bypasses now blocked. 21 regression tests added.
  • /v1/cloud/screenshot validates returned bytes start with a known image magic header (PNG, JPEG). Catches vendor-side response corruption.
  • Worker p50 latency reduced 4.0s → 1.25s (-69%) via spider.cloud connection warmup and pool tuning.
2026-05-13

feat Perf: jemalloc + spawn_blocking + worker parallelism

  • Linux builds now use tikv-jemallocator. Significant fragmentation improvement on worker hosts.
  • CPU-heavy producer work moved to tokio blocking pool.
  • WORKER_PARALLEL_JOBS default 4 → 8; tier-aware per-request concurrency caps (internal = 64).
2026-05-13

docs Pricing tiers locked

  • Canonical tier names: Free / Pro $15 / Studio $69 / Agency $279 / Scale $499 / Enterprise.
  • Two integer counters: pages + cloud_pages. No multipliers, no credit dust.

Want the engineering-level detail? Daily session logs live in vikasswaminh/crawlcrawl/docs.