BytesAgain is a curated directory of 60,000+ AI agent skills from ClawHub, GitHub, LobeHub, and Dify. Search skills by keyword in 7 languages, browse by role (developer, creator, trader, marketer) or by use case.

How do I find AI skills on BytesAgain?

Use the search bar on BytesAgain.com to search by keyword in 7 languages. You can also browse by role (developer, creator, trader, marketer) or by use case. Each skill shows install instructions for Claude, Cursor, OpenClaw, Continue, and more.

Yes, BytesAgain is completely free. No registration required for searching skills. The MCP API is also free with rate limits.

Does BytesAgain have an API for AI agents?

Yes! BytesAgain provides a free MCP SSE endpoint at /api/mcp/sse for AI agents, plus a REST API at /api/mcp?action=search&q= . No authentication needed.

Can I request a new AI skill on BytesAgain?

Yes! Visit the Requests page on BytesAgain.com to submit a skill request. Your request will be visible to the community and notified to the site admin.

🦀 ClawHub

Python Sdk Best Practices

by @meirk-brd

Guide for writing correct Bright Data Python SDK code. Always use this skill when writing, modifying, debugging, or reviewing Python code that uses the brigh...

Versionv1.0.0

Downloads273

#coding #api #writing #legal #gaming

View on ClawHub →

TERMINAL

clawhub install brightdata-python-sdk-best-practices

📖 About This Skill

name: python-sdk-best-practices description: Guide for writing correct Bright Data Python SDK code. Always use this skill when writing, modifying, debugging, or reviewing Python code that uses the brightdata-sdk package, imports from brightdata, or interacts with Bright Data APIs. Use when the user asks to scrape websites, search Google/Bing, access datasets, or automate browsers via Bright Data in Python.

Bright Data Python SDK - Best Practices for Coding Agents

You are writing code that uses the brightdata-sdk Python package. Follow these rules precisely.

Installation

pip install brightdata-sdk

Critical Rules

1. Always use context managers. The client MUST be used with async with (or with for sync). Forgetting this causes RuntimeError: BrightDataClient not initialized. 2. Async is the default. The primary client is BrightDataClient (async). Use SyncBrightDataClient only when you cannot use async. 3. Never use SyncBrightDataClient inside async functions. It raises RuntimeError. Use BrightDataClient instead. 4. Token auto-loads from environment. Set BRIGHTDATA_API_TOKEN env var or pass token= param. Do not hardcode tokens. 5. All scraper methods are awaitable. Every call on the async client must be awaited.

Authentication

# Option 1: Environment variable (preferred)
export BRIGHTDATA_API_TOKEN="your_token"
async with BrightDataClient() as client:
    ...
Option 2: Explicit token
async with BrightDataClient(token="your_token") as client:
    ...
Option 3: .env file (requires python-dotenv)
BRIGHTDATA_API_TOKEN=your_token
async with BrightDataClient() as client:
    ...

Imports

# Main clients
from brightdata import BrightDataClient          # Async (primary)
from brightdata import SyncBrightDataClient       # Sync wrapper
Result models
from brightdata import ScrapeResult, SearchResult, CrawlResult
Job model (for manual trigger/poll/fetch)
from brightdata import ScrapeJob
Exceptions
from brightdata import (
    BrightDataError,       # Base exception
    ValidationError,       # Invalid input
    AuthenticationError,   # Bad/missing token
    APIError,              # API request failed (has .status_code, .response_text)
    ZoneError,             # Zone operation failed
    NetworkError,          # Network issue
    SSLError,              # SSL cert error
)
Scraper Studio
from brightdata import ScraperStudioJob, JobStatus
Dataset export utility
from brightdata.datasets import export

Core Patterns

Pattern 1: Web Scraping (Web Unlocker)

Scrapes any URL through Bright Data's proxy network, bypassing bot detection.

import asyncio
from brightdata import BrightDataClient
async def main():
    async with BrightDataClient() as client:
        # Single URL - returns ScrapeResult
        result = await client.scrape_url("https://example.com")
        print(result.success)  # bool
        print(result.data)     # HTML string or parsed data
        print(result.cost)     # float, USD
        # With options
        result = await client.scrape_url(
            url="https://example.com",
            country="us",              # Proxy country
            response_format="raw",     # "raw" (HTML) or "json"
            method="GET",              # HTTP method
            timeout=60,                # Request timeout seconds
        )asyncio.run(main())

Async mode (non-blocking, for batch/background):

result = await client.scrape_url(
    url="https://example.com",
    mode="async",           # Triggers, polls, returns when ready
    poll_interval=5,        # Seconds between polls
    poll_timeout=180,       # Max wait (Web Unlocker async ~2 min)
)
Batch: pass a list of URLs
results = await client.scrape_url(
    url=["https://example.com/1", "https://example.com/2"],
    mode="async",
    poll_timeout=180,
)
Returns List[ScrapeResult]

Pattern 2: Platform-Specific Scrapers

Structured data extraction from major platforms. Pattern: client.scrape..(url=...).

async with BrightDataClient() as client:
    # Amazon
    product = await client.scrape.amazon.products(url="https://amazon.com/dp/B0CRMZHDG8")
    reviews = await client.scrape.amazon.reviews(url="https://amazon.com/dp/B0CRMZHDG8")
    sellers = await client.scrape.amazon.sellers(url="https://amazon.com/dp/B0CRMZHDG8")
    # LinkedIn
    profile = await client.scrape.linkedin.profiles(url="https://linkedin.com/in/username")
    company = await client.scrape.linkedin.companies(url="https://linkedin.com/company/name")
    posts   = await client.scrape.linkedin.posts(url="https://linkedin.com/posts/...")
    # Instagram
    ig_profile  = await client.scrape.instagram.profiles(url="https://instagram.com/username")
    ig_posts    = await client.scrape.instagram.posts(url="https://instagram.com/p/...")
    ig_comments = await client.scrape.instagram.comments(url="https://instagram.com/p/...")
    ig_reels    = await client.scrape.instagram.reels(url="https://instagram.com/reel/...")
    # Facebook
    fb_posts    = await client.scrape.facebook.posts_by_profile(url="https://facebook.com/user", num_of_posts=10)
    fb_group    = await client.scrape.facebook.posts_by_group(url="https://facebook.com/groups/...", num_of_posts=10)
    fb_comments = await client.scrape.facebook.comments(url="https://facebook.com/post/...", num_of_comments=20)
    fb_reels    = await client.scrape.facebook.reels(url="https://facebook.com/reel/...")
    # YouTube
    yt_profile  = await client.scrape.youtube.profiles(url="https://youtube.com/@channel")
    yt_video    = await client.scrape.youtube.videos(url="https://youtube.com/watch?v=...")
    yt_comments = await client.scrape.youtube.comments(url="https://youtube.com/watch?v=...")
    # ChatGPT
    response = await client.scrape.chatgpt.prompt(prompt="What is Python?")
    # Batch prompts
    responses = await client.scrape.chatgpt.prompts(prompts=["Q1", "Q2", "Q3"])
    # TikTok
    tt_profile = await client.scrape.tiktok.profiles(url="https://tiktok.com/@user")    # Reddit
    reddit_post = await client.scrape.reddit.posts(url="https://reddit.com/r/...")

All scraper methods return ScrapeResult with .success, .data, .cost, .status.

Pattern 3: Search Discovery (keyword-based)

Find content by keyword instead of URL:

async with BrightDataClient() as client:
    # Amazon product search
    results = await client.scrape.amazon.products_search(keyword="wireless headphones")
    # LinkedIn searches
    profiles = await client.scrape.linkedin.profiles_search(keyword="data engineer", location="San Francisco")
    jobs     = await client.scrape.linkedin.jobs_search(keyword="python developer", location="New York")
    companies = await client.scrape.linkedin.companies_search(keyword="AI startup")
    # Instagram search
    ig_profiles = await client.scrape.instagram.profiles_search(user_name="photography")
    ig_posts    = await client.scrape.instagram.posts_search(url="https://instagram.com/user", num_of_posts=20)
    ig_reels    = await client.scrape.instagram.reels_search(url="https://instagram.com/user", num_of_posts=10)    # YouTube search
    videos = await client.scrape.youtube.videos_search(keyword="python tutorial", num_of_videos=10)

Pattern 4: SERP (Search Engine Results)

async with BrightDataClient() as client:
    # Google
    result = await client.search.google(
        query="python web scraping",
        location="United States",    # Optional
        language="en",               # Default: "en"
        device="desktop",            # "desktop" or "mobile"
        num_results=10,              # Number of results
    )
    for item in result.data:
        print(item["title"], item["link"])
    # Bing
    result = await client.search.bing(query="python tutorial", num_results=10)    # Yandex
    result = await client.search.yandex(query="python", num_results=10)

SERP async mode:

result = await client.search.google(
    query="python",
    mode="async",
    poll_interval=2,
    poll_timeout=30,
)

SERP returns SearchResult with .data (list of dicts), .query, .search_engine.

Pattern 5: Datasets API

Access 175+ pre-collected, structured datasets.

async with BrightDataClient() as client:
    # Filter a dataset - returns snapshot_id (string)
    snapshot_id = await client.datasets.imdb_movies(
        filter={"name": "title", "operator": "includes", "value": "black"},
        records_limit=5,
    )
    # Download results (polls until ready)
    data = await client.datasets.imdb_movies.download(snapshot_id)
    print(f"Got {len(data)} records")
    # Quick sample (no filter needed)
    snapshot_id = await client.datasets.amazon_products.sample(records_limit=10)
    data = await client.datasets.amazon_products.download(snapshot_id)    # Get field metadata
    metadata = await client.datasets.imdb_movies.get_metadata()
    for name, field in metadata.fields.items():
        print(f"{name}: {field.type}")

Export to file:

from brightdata.datasets import exportexport(data, "results.json")    # JSON
export(data, "results.csv")     # CSV
export(data, "results.jsonl")   # JSONL

Available datasets include: amazon_products, amazon_reviews, linkedin_profiles, linkedin_companies, linkedin_jobs, airbnb_properties, imdb_movies, google_maps_reviews, yelp_businesses, glassdoor_companies, zillow_properties, instagram_profiles, tiktok_profiles, facebook_pages_posts, reddit_posts, goodreads_books, nba_players_stats, and 150+ more.

Pattern 6: Scraper Studio (Custom Scrapers)

Run custom scrapers built in Bright Data's Scraper Studio.

async with BrightDataClient() as client:
    # High-level: trigger + poll + return
    data = await client.scraper_studio.run(
        collector="c_abc123",                          # Collector ID from dashboard
        input={"url": "https://example.com/page"},     # Input for the scraper
        timeout=180,                                    # Max wait seconds
        poll_interval=10,                               # Poll frequency
    )    # Manual control
    job = await client.scraper_studio.trigger(
        collector="c_abc123",
        input={"url": "https://example.com/page"},
    )
    print(job.response_id)
    status = await job.status()        # Returns JobStatus enum
    data = await job.wait_and_fetch(timeout=120, poll_interval=10)

Pattern 7: Browser API (CDP)

Connect to Bright Data cloud browsers via Chrome DevTools Protocol.

from brightdata import BrightDataClient
client = BrightDataClient(
    browser_username="brd-customer-hl_xxx-zone-scraping_browser1",
    browser_password="your_password",
)
Or use env vars: BRIGHTDATA_BROWSERAPI_USERNAME, BRIGHTDATA_BROWSERAPI_PASSWORD
url = client.browser.get_connect_url(country="us")  # Optional country
With Playwright
from playwright.async_api import async_playwrightasync with async_playwright() as pw:
    browser = await pw.chromium.connect_over_cdp(url)
    page = await browser.new_page()
    await page.goto("https://example.com")
    content = await page.content()
    await browser.close()

Pattern 8: Manual Trigger/Poll/Fetch

For fine-grained control over long-running scrapes:

async with BrightDataClient() as client:
    # Step 1: Trigger (non-blocking)
    job = await client.scrape.amazon.products_trigger(url="https://amazon.com/dp/B123")
    print(f"Snapshot ID: {job.snapshot_id}")
    # Step 2: Check status
    status = await job.status()  # "ready", "running", etc.
    # Step 3: Wait for completion
    await job.wait(timeout=180, poll_interval=10, verbose=True)
    # Step 4: Fetch results
    data = await job.fetch()    # Or combine wait + fetch into ScrapeResult:
    result = await job.to_result(timeout=180)
    print(result.data)

Pattern 9: Concurrent Batch Operations

import asyncio
from brightdata import BrightDataClient
async def main():
    async with BrightDataClient() as client:
        # Concurrent scraping
        urls = [
            "https://amazon.com/dp/B001",
            "https://amazon.com/dp/B002",
            "https://amazon.com/dp/B003",
        ]
        tasks = [client.scrape.amazon.products(url=u) for u in urls]
        results = await asyncio.gather(*tasks)
        for r in results:
            print(f"{r.url}: success={r.success}, cost=${r.cost:.4f}")
        # Concurrent SERP queries
        queries = ["python", "javascript", "rust"]
        search_tasks = [client.search.google(query=q) for q in queries]
        search_results = await asyncio.gather(*search_tasks)asyncio.run(main())

Pattern 10: Sync Client

For scripts, notebooks, or non-async codebases:

from brightdata import SyncBrightDataClient
with SyncBrightDataClient() as client:
    # All methods are synchronous - no await needed
    result = client.scrape_url("https://example.com")
    print(result.data)
    result = client.scrape.amazon.products(url="https://amazon.com/dp/B123")
    result = client.search.google(query="python")    # Datasets
    snapshot_id = client.datasets.imdb_movies(
        filter={"name": "title", "operator": "includes", "value": "black"},
        records_limit=5,
    )
    data = client.datasets.imdb_movies.download(snapshot_id)

WARNING: Never use SyncBrightDataClient inside an async def function. It will raise a RuntimeError.

Result Objects Reference

All results inherit from BaseResult:

result.success          # bool - operation succeeded
result.cost             # Optional[float] - cost in USD
result.error            # Optional[str] - error message if failed
result.elapsed_ms()     # Optional[float] - total time in ms
result.to_dict()        # Dict - serializable dictionary
result.to_json(indent=2)  # str - JSON string
result.save_to_file("out.json")  # Save to file

ScrapeResult additional fields:

result.url              # str - original URL
result.status           # "ready" | "error" | "timeout" | "in_progress"
result.data             # Any - scraped data (dict, list, or HTML string)
result.snapshot_id      # Optional[str] - Bright Data snapshot ID
result.platform         # Optional[str] - "amazon", "linkedin", etc.
result.row_count        # Optional[int] - number of data rows

SearchResult additional fields:

result.query            # Dict - original query params
result.data             # List[Dict] - search results
result.search_engine    # "google" | "bing" | "yandex"
result.total_found      # Optional[int] - total results found

Error Handling

from brightdata import (
    BrightDataClient,
    BrightDataError,
    ValidationError,
    AuthenticationError,
    APIError,
    NetworkError,
)async with BrightDataClient() as client:
    try:
        result = await client.scrape_url("https://example.com")
    except AuthenticationError:
        print("Invalid API token")
    except APIError as e:
        print(f"API error {e.status_code}: {e.message}")
        print(f"Response: {e.response_text}")
    except NetworkError:
        print("Network connectivity issue")
    except ValidationError:
        print("Invalid input parameters")
    except BrightDataError as e:
        print(f"Bright Data error: {e.message}")

Client Configuration

client = BrightDataClient(
    token="...",                    # API token (or use env var)
    timeout=30,                     # Default request timeout (seconds)
    web_unlocker_zone="sdk_unlocker",  # Web Unlocker zone name
    serp_zone="sdk_serp",          # SERP zone name
    auto_create_zones=True,        # Auto-create zones if missing
    validate_token=False,          # Validate token on init
    rate_limit=10.0,               # Max requests per rate_period (None to disable)
    rate_period=1.0,               # Rate limit window (seconds)
)

Zone auto-creation: By default, the SDK creates sdk_unlocker and sdk_serp zones on first use. Set auto_create_zones=False to disable.

Zone Management

async with BrightDataClient() as client:
    # List all active zones
    zones = await client.list_zones()
    for zone in zones:
        print(f"{zone['name']}: {zone.get('type', 'unknown')}")
    # Delete a zone
    await client.delete_zone("test_zone")    # Test connection
    is_valid = await client.test_connection()

Common Mistakes to Avoid

1. Forgetting the context manager:

   # WRONG - will raise RuntimeError
   client = BrightDataClient()
   result = await client.scrape_url("https://example.com")   # CORRECT
   async with BrightDataClient() as client:
       result = await client.scrape_url("https://example.com")

2. Using sync client in async code:

   # WRONG - will raise RuntimeError
   async def main():
       with SyncBrightDataClient() as client:
           result = client.scrape_url("...")   # CORRECT
   async def main():
       async with BrightDataClient() as client:
           result = await client.scrape_url("...")

3. Forgetting await:

   # WRONG - returns coroutine, not result
   result = client.scrape_url("https://example.com")   # CORRECT
   result = await client.scrape_url("https://example.com")

4. Not checking result.success:

   result = await client.scrape_url("https://example.com")
   # Always check success before using data
   if result.success:
       process(result.data)
   else:
       print(f"Failed: {result.error}")

5. Hardcoding API tokens:

   # WRONG
   client = BrightDataClient(token="abc123secret")   # CORRECT - use environment variable
   # export BRIGHTDATA_API_TOKEN=abc123secret
   client = BrightDataClient()

Environment Variables

| Variable | Purpose | Default | |----------|---------|---------| | BRIGHTDATA_API_TOKEN | API authentication token | Required | | WEB_UNLOCKER_ZONE | Web Unlocker zone name | sdk_unlocker | | SERP_ZONE | SERP zone name | sdk_serp | | BRIGHTDATA_BROWSERAPI_USERNAME | Browser API username | None | | BRIGHTDATA_BROWSERAPI_PASSWORD | Browser API password | None |

Quick Decision Guide

| Task | Method | |------|--------| | Scrape any URL (HTML) | client.scrape_url(url) | | Scrape Amazon/LinkedIn/etc. (structured) | client.scrape..(url=...) | | Search Google/Bing/Yandex | client.search.google(query=...) | | Find products/profiles by keyword | client.scrape.._search(keyword=...) | | Access pre-collected datasets | client.datasets.(filter=..., records_limit=...) | | Run custom Scraper Studio scraper | client.scraper_studio.run(collector=..., input=...) | | Automate browser (Playwright/Puppeteer) | client.browser.get_connect_url() | | Long-running scrape with manual control | client.scrape.._trigger(url=...) then job.wait() + job.fetch() |

For the full API surface and advanced patterns, read references/api-reference.md.