🎁 Get the FREE AI Skills Starter Guide β€” Subscribe β†’
BytesAgainBytesAgain
πŸ¦€ ClawHub

Scraper Builder

by @meirk-brd

Build production-ready web scrapers for any website using Bright Data infrastructure. Guides you through site analysis, API selection, selector extraction, p...

πŸ’‘ Examples

Example 1: E-commerce product scraper (pre-built exists)

User says: "Build a scraper for Amazon product pages, I have a list of 200 ASINs"

Actions: 1. Check supported-domains.md β†’ Amazon has pre-built scrapers 2. 200 URLs β†’ use async trigger/poll/fetch (over 20 URL sync limit) 3. Use client.scrape.amazon.products_trigger() with batch of URLs 4. Poll until ready, download structured JSON 5. No HTML parsing needed β€” data comes pre-structured

Result: Complete Python script with async batch scraping, progress logging, JSON output.

Example 2: Custom site scraper (no pre-built)

User says: "I need to scrape all job listings from jobs.customsite.com including pagination"

Actions: 1. Check supported-domains.md β†’ not listed β†’ query Dataset List API β†’ not found 2. Fetch page HTML via Web Unlocker to analyze structure 3. Content is in the HTML (SSR) β†’ Web Unlocker approach 4. Identify selectors: .job-card, .job-title, .company-name, .salary 5. Pagination via ?page=N URL parameter 6. Build complete scraper with fetcher + parser + paginator

Result: Complete Python script using Web Unlocker + BeautifulSoup with URL-based pagination.

Example 3: JS-heavy site (Browser API needed)

User says: "Scrape product prices from a React SPA that loads data on scroll"

Actions: 1. Fetch HTML via Web Unlocker β†’ body is empty div#root β†’ client-rendered 2. Check for hidden API in page source β†’ no API found 3. Escalate to Browser API with Playwright 4. Implement infinite scroll pattern with resource blocking 5. Extract data via page.evaluate() after content loads

Result: Async Playwright script with Browser API, infinite scroll handling, bandwidth optimization.


πŸ“‹ Tips & Best Practices

Web Unlocker returns empty or blocked page

Cause: Site requires JavaScript rendering or has aggressive bot detection. Solution: Escalate to Browser API. Also try adding data_format: "markdown" to see if the content is there but in a different format.

Selectors work locally but fail in production

Cause: Site serves different HTML to different regions or user agents. Solution: Add country parameter to Web Unlocker request to target the same region. Verify selectors on the actual HTML returned by the API, not browser DevTools.

Scraper returns duplicate items across pages

Cause: Pagination logic is wrapping around or site uses inconsistent pagination. Solution: Track seen item IDs in a set. Break when duplicates appear. Verify the pagination URL pattern is correct.

Browser API session times out

Cause: Navigation timeout too short or site is slow to unblock. Solution: Always set set_default_navigation_timeout(120_000). Use wait_until="domcontentloaded" not networkidle. Check if the site requires premium domains enabled on your zone.

API returns 401 Unauthorized

Cause: Missing or invalid BRIGHTDATA_API_KEY environment variable. Solution: Verify the key is set: echo $BRIGHTDATA_API_KEY. Get a fresh key from https://brightdata.com/cp/setting/users.


View on ClawHub
TERMINAL
clawhub install brightdata-scraper-builder

πŸ§ͺ Use this skill with your agent

Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

πŸ” Can't find the right skill?

Search 60,000+ AI agent skills β€” free, no login needed.

Search Skills β†’