Scraper Builder
by @meirk-brd
Build production-ready web scrapers for any website using Bright Data infrastructure. Guides you through site analysis, API selection, selector extraction, p...
Example 1: E-commerce product scraper (pre-built exists)
User says: "Build a scraper for Amazon product pages, I have a list of 200 ASINs"
Actions:
1. Check supported-domains.md β Amazon has pre-built scrapers
2. 200 URLs β use async trigger/poll/fetch (over 20 URL sync limit)
3. Use client.scrape.amazon.products_trigger() with batch of URLs
4. Poll until ready, download structured JSON
5. No HTML parsing needed β data comes pre-structured
Result: Complete Python script with async batch scraping, progress logging, JSON output.
Example 2: Custom site scraper (no pre-built)
User says: "I need to scrape all job listings from jobs.customsite.com including pagination"
Actions:
1. Check supported-domains.md β not listed β query Dataset List API β not found
2. Fetch page HTML via Web Unlocker to analyze structure
3. Content is in the HTML (SSR) β Web Unlocker approach
4. Identify selectors: .job-card, .job-title, .company-name, .salary
5. Pagination via ?page=N URL parameter
6. Build complete scraper with fetcher + parser + paginator
Result: Complete Python script using Web Unlocker + BeautifulSoup with URL-based pagination.
Example 3: JS-heavy site (Browser API needed)
User says: "Scrape product prices from a React SPA that loads data on scroll"
Actions:
1. Fetch HTML via Web Unlocker β body is empty div#root β client-rendered
2. Check for hidden API in page source β no API found
3. Escalate to Browser API with Playwright
4. Implement infinite scroll pattern with resource blocking
5. Extract data via page.evaluate() after content loads
Result: Async Playwright script with Browser API, infinite scroll handling, bandwidth optimization.
Web Unlocker returns empty or blocked page
Cause: Site requires JavaScript rendering or has aggressive bot detection. Solution: Escalate to Browser API. Also try addingdata_format: "markdown" to see if the content is there but in a different format.Selectors work locally but fail in production
Cause: Site serves different HTML to different regions or user agents. Solution: Addcountry parameter to Web Unlocker request to target the same region. Verify selectors on the actual HTML returned by the API, not browser DevTools.Scraper returns duplicate items across pages
Cause: Pagination logic is wrapping around or site uses inconsistent pagination. Solution: Track seen item IDs in a set. Break when duplicates appear. Verify the pagination URL pattern is correct.Browser API session times out
Cause: Navigation timeout too short or site is slow to unblock. Solution: Always setset_default_navigation_timeout(120_000). Use wait_until="domcontentloaded" not networkidle. Check if the site requires premium domains enabled on your zone.API returns 401 Unauthorized
Cause: Missing or invalidBRIGHTDATA_API_KEY environment variable.
Solution: Verify the key is set: echo $BRIGHTDATA_API_KEY. Get a fresh key from https://brightdata.com/cp/setting/users.clawhub install brightdata-scraper-builder