Drop in a URL. Get back author, publish date, and the article body — boilerplate-stripped, sidebar-free, comment-free. Built on trafilatura, the highest-precision body extractor in production today.
Generic markdown converters keep navigation menus, footers, ad blocks, "related posts" sidebars, and disqus comments. For RAG ingestion, news aggregation, or sentiment analysis, that noise destroys recall and inflates token cost. extract-article runs trafilatura against the page and returns just the body plus structured metadata.
curl -X POST https://api.crawlcrawl.com/v1/actors/extract-article \
-H "Authorization: Bearer crk_..." \
-d '{"url":"https://www.geeksforgeeks.org/dsa/binary-search/"}'
# → 200
{
"actor": "extract-article",
"url": "https://www.geeksforgeeks.org/dsa/binary-search/",
"elapsed_ms": 129,
"data": {
"metadata": {
"title": "Binary Search - GeeksforGeeks",
"author": "Sandeep Jain",
"date": "2014-01-28"
},
"word_count": 2798,
"text": "Binary Search is a searching algorithm ..."
}
}
Trafilatura is the academic gold-standard for body extraction. On the standardised CleanEval benchmark it outperforms readability.js, Newspaper3k, and Goose in precision and recall. We use the Rust port (rs-trafilatura) so a single call returns in ~130 ms with no Python sidecar.
datePublished, OpenGraph article:published_time, or visible byline dates in fallback order.RAG ingestion. Chunking the body without nav/footer noise cuts token usage 30–60% on most news sites and produces materially better retrieval. Pair with /v1/crawls for full-site ingest.
Editorial monitoring. Author and date in the same response make it trivial to detect new-publication events on a competitor blog without parsing HTML yourself.
Sentiment / classification pipelines. A clean body is a clean input; you avoid the failure mode where your classifier learns to read footer copyright notices.
One page-credit per call. The $42 Studio tier includes 50,000 page-credits a month. Compare to Diffbot's Article API at $0.10–$0.25/call — extract-article is roughly 300× cheaper. See full pricing →
$42/mo for 100,000 extractions. ~300× cheaper than Diffbot Article API.
Get an API key — free