Run a single Docker command and Crawlstack is ready. The container includes headless Chrome with Crawlstack built in. Dashboard at localhost:3002.
Prefer your own browser? Install the Chrome extension instead.
Dashboard at localhost:3002. Includes headless Chrome with anti-detection.
If you can write document.querySelector(), you can use Crawlstack. No new framework. No special SDK. Your script runs directly in the page with full access to the DOM, cookies, and session state.
Use runner.publishItems() to store data and runner.addTasks() to follow pages. That's the entire API.
SQLite via WASM runs directly in the browser. No network round-trips to a database server. Millions of rows without browser slowdown. Export as CSV, JSON, or raw SQLite.
Scale to multiple nodes with Turso or libSQL when you need to.
isTrusted: falseisTrusted: trueHardware-level trusted clicks via CDP Input.dispatchMouseEvent
Cloudflare Turnstile checks if clicks are hardware-generated. Crawlstack dispatches clicks through Chrome DevTools Protocol at the OS level, so isTrusted is always true.
Built for Cloudflare Turnstile. Works with any challenge that checks event trust level.
Full DOM snapshots, network logs, console output, and screenshots captured automatically. Replay any run offline to understand exactly what happened.
No more guessing why a crawler failed. See every step it took.
Random mouse paths with natural Bezier curves, variable speed, and timing jitter. Every click, scroll, and hover generates real isTrusted input events.
Bot detection that relies on mouse behavior analysis sees a real person.
Crawlstack detects new, changed, and deleted items automatically then pushes the diff to your endpoints in real-time. Build live data pipelines without polling.
Integrate with Zapier, n8n, Make, or your own backend.
Crawlstack exposes its full API as Model Context Protocol tools. Any AI agent (Claude, GPT, custom) can discover nodes, preview scripts, launch runs, and retrieve data.
Your agent gets a real browser it can control programmatically.
The relay server tunnels your local dashboard to a public URL. Monitor runs, view data, and manage crawlers from any device without port forwarding or VPN.
Secure WebSocket tunnel over TLS. No data stored on relay.
Same crawler script, multiple Docker nodes. Tasks are distributed automatically with shared queues. Built-in deduplication ensures no page is crawled twice, even across nodes.
Zero code changes to go from one node to a cluster. Just add containers.
Point an LLM at the whole page or just the parts you care about and get structured data back. No brittle CSS selectors that break when the site changes.
Works with OpenAI, Anthropic, Google, or any OpenAI-compatible endpoint. Pass a JSON schema for typed output.
Every item you crawl can be automatically embedded. Query your data by meaning, not keywords. Built-in vector storage and cosine similarity search with no external vector DB needed.
Ships with e5-small for instant embeddings out of the box. Plug in OpenAI, Cohere, or any embedding API for production workloads.
Extract M MacBooks (description, params) for finding new offers with like new condition using AI filtering.
Plain JavaScript. Built-in SQLite. Webhooks. Docker. No new framework to learn.