Playwright is a test automation framework repurposed for scraping. Crawlstack is built for scraping from the ground up. Compare their approaches to stealth, auth, data pipelines, and developer experience.
Playwright is Microsoft's cross-browser automation library. It supports Chromium, Firefox, and WebKit, has excellent auto-wait APIs, and is the industry standard for end-to-end testing. Many developers have adopted it for scraping because of its modern API and reliability.
Crawlstack is purpose-built for web scraping. It runs inside a real Chrome browser as a MV3 extension, with full scraping infrastructure — scheduling, deduplication, webhook delivery, distributed crawling, and visual debugging — built in.
Playwright is a fantastic testing tool that can scrape. Crawlstack is a scraping platform that understands what production crawling actually requires.
Playwright controls browsers from outside. Your Node.js/Python/Java/.NET process sends commands via the browser's automation protocol:
Your Process → Automation Protocol → Browser Instance(s)
↓
Page ContentCrawlstack runs inside the browser. The extension's service worker coordinates crawlers that execute directly in tab contexts:
Chrome Browser
└── Crawlstack Extension
└── Service Worker → Tab Workers
└── Your script runs IN the pageThis means Crawlstack scripts have direct DOM access, run in a real browser fingerprint, and inherit your existing browser sessions (cookies, auth tokens, localStorage).
| Feature | Crawlstack | Playwright |
|---|---|---|
| Primary purpose | Web scraping platform | Test automation framework |
| Runtime | Real Chrome (MV3 extension) | Headless Chromium/Firefox/WebKit |
| Browser support | Chrome only | Chromium, Firefox, WebKit |
| Stealth | Undetectable — real browser | Detectable headless fingerprint |
| Anti-bot bypass | Built-in Cloudflare Turnstile solver | None |
| Auto-wait | runner.onLoad(), runner.waitFor() | Built-in auto-wait on locators |
| Auth handling | Uses existing browser sessions | Manual context/cookie management |
| Human simulation | Built-in Bézier mouse, realistic scroll | page.click() with optional delay |
| Data storage | SQLite with dedup and versioning | None |
| Scheduling | Built-in cron scheduling | None |
| Webhook delivery | Built-in per-item | None |
| Distributed crawling | Multi-node Docker cluster | None — use 3rd-party tools |
| Debugging | DevTools-native + flight recorder | Trace viewer, video recording |
| Network interception | runner.getRequests(), runner.fetch() | page.route(), page.on('request') |
| REST API | 40+ endpoints | None |
| MCP tools | 18 tools for AI-driven development | None |
| Languages | JavaScript | Node.js, Python, Java, .NET |
| License | Free, self-hosted | Apache-2.0 |
Playwright's cross-browser support is great for testing — you verify your app works in Chrome, Firefox, and Safari. For scraping, it's less useful. Anti-bot systems don't care which headless browser you use; they detect headless.
Common detection vectors for Playwright:
navigator.webdriver is set to trueCrawlstack avoids all of these by running in a real Chrome installation. The Cloudflare Turnstile solver works because the browser environment is genuinely authentic — there's nothing to detect.
Playwright's auto-wait locator API is genuinely excellent for testing:
// Playwright automatically waits for the element to be visible and actionable
await page.getByRole('button', { name: 'Submit' }).click();
await page.getByText('Success').waitFor();For scraping, auto-wait on individual elements is less relevant. You typically need to wait for the page to finish loading, then extract everything at once. Crawlstack's API reflects this:
// Wait for page load and network idle
await runner.onLoad();
// Or wait for a specific condition
await runner.waitFor(() => document.querySelectorAll('.product').length > 0);Both work, but Crawlstack's approach maps better to scraping workflows.
This is where the architectural difference really shows. Scraping behind a login with Playwright means managing authentication yourself.
const { chromium } = require('playwright');
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
// Must handle login manually
await page.goto('https://example.com/login');
await page.fill('#email', '[email protected]');
await page.fill('#password', 'password123');
await page.click('button[type="submit"]');
await page.waitForURL('**/dashboard');
// Now scrape authenticated content
await page.goto('https://example.com/dashboard/data');
const data = await page.evaluate(() => {
return [...document.querySelectorAll('.data-row')].map(el => ({
name: el.querySelector('.name').innerText,
value: el.querySelector('.value').innerText,
}));
});
await browser.close();
// Now what? Save to file? Database? No built-in pipeline.With Playwright, you need to:
storageState to reuse sessions across runs// Already logged in — your browser session is the crawler's session
await runner.onLoad();
const data = [...document.querySelectorAll('.data-row')].map(el => ({
id: el.querySelector('.name').innerText,
data: {
name: el.querySelector('.name').innerText,
value: el.querySelector('.value').innerText,
}
}));
await runner.publishItems(data);
// Items are stored, deduped, and optionally sent to your webhookWith Crawlstack:
runner.publishItems() stores, deduplicates, and deliversThis is the killer feature for scraping behind auth. You don't need to reverse-engineer login flows or manage session state. You just browse to the page and let the crawler extract data from whatever your browser can see.
Playwright has a genuinely good debugging story. Its Trace Viewer lets you see screenshots, DOM snapshots, and action logs for test runs:
const context = await browser.newContext({
recordVideo: { dir: 'videos/' },
});
// After the test, open the trace:
// npx playwright show-trace trace.zipCrawlstack's flight recorder is conceptually similar but designed for scraping:
The key difference is that Crawlstack records every run by default, so when a production crawler fails at 3 AM, you can replay exactly what happened without needing to have preemptively enabled tracing.
Both tools support network interception, but with different ergonomics.
Playwright intercepts from outside the browser:
await page.route('**/api/**', route => {
// Modify, abort, or continue requests
});
page.on('response', async response => {
if (response.url().includes('/api/data')) {
const json = await response.json();
// Process API response
}
});Crawlstack intercepts from inside:
const requests = runner.getRequests({ urlFilter: /api\/data/ });
// Access captured request/response data directly
// Or use runner.fetch() for stealth requests that bypass CORS
const data = await runner.fetch('https://api.example.com/data');runner.fetch() is particularly useful — it makes requests through the browser's network stack with the page's cookies and headers, bypassing CORS restrictions. Playwright has no equivalent.
This is where the comparison shifts from "different approach" to "different category." Playwright gives you data in memory. What you do with it is your problem:
Crawlstack handles all of this:
runner.publishItems() → stored in SQLite with version historyChoose Playwright when:
Choose Crawlstack when:
Playwright is an excellent automation library that many developers repurpose for scraping. It works — especially for simple, unprotected targets. But scraping is not testing, and the things scraping needs (stealth, auth reuse, data pipelines, scheduling, dedup) aren't things a test framework provides.
Crawlstack is built for scraping from the ground up. The browser-native architecture solves stealth. The existing browser session solves auth. The built-in pipeline solves data management. And the flight recorder solves "what went wrong at 3 AM."
If you're already using Playwright for scraping and hitting limitations — bot detection, auth complexity, data pipeline boilerplate — Crawlstack addresses exactly those pain points.
Crawlstack is a self-hosted scraping infrastructure that runs inside your browser or Docker. Get started for free.
Get started with Crawlstack today and experience the future of scraping.