Skip to content
Cloudflare Docs

Changelog

New updates and improvements at Cloudflare.

hero image

Crawl entire websites with a single API call using Browser Rendering

Edit: this post has been edited to clarify crawling behavior with respect to site guidance.

You can now crawl an entire website with a single API call using Browser Rendering's new /crawl endpoint, available in open beta. Submit a starting URL, and pages are automatically discovered, rendered in a headless browser, and returned in multiple formats, including HTML, Markdown, and structured JSON. The endpoint is a signed-agent that respects robots.txt and AI Crawl Control by default, making it easy for developers to comply with website rules, and making it less likely for crawlers to ignore web-owner guidance. This is great for training models, building RAG pipelines, and researching or monitoring content across a site.

Crawl jobs run asynchronously. You submit a URL, receive a job ID, and check back for results as pages are processed.

Terminal window
# Initiate a crawl
curl -X POST 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl' \
-H 'Authorization: Bearer <apiToken>' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://blog.cloudflare.com/"
}'
# Check results
curl -X GET 'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}' \
-H 'Authorization: Bearer <apiToken>'

Key features:

  • Multiple output formats - Return crawled content as HTML, Markdown, and structured JSON (powered by Workers AI)
  • Crawl scope controls - Configure crawl depth, page limits, and wildcard patterns to include or exclude specific URL paths
  • Automatic page discovery - Discovers URLs from sitemaps, page links, or both
  • Incremental crawling - Use modifiedSince and maxAge to skip pages that haven't changed or were recently fetched, saving time and cost on repeated crawls
  • Static mode - Set render: false to fetch static HTML without spinning up a browser, for faster crawling of static sites
  • Well-behaved bot - Honors robots.txt directives, including crawl-delay

Available on both the Workers Free and Paid plans.

Note: the /crawl endpoint cannot bypass Cloudflare bot detection or captchas, and self-identifies as a bot.

To get started, refer to the crawl endpoint documentation. If you are setting up your own site to be crawled, review the robots.txt and sitemaps best practices.