🎁 Get the FREE AI Skills Starter GuideSubscribe →
BytesAgainBytesAgain
🦀 ClawHub✦ BytesAgain

Crawler

by @bytesagain3

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations

Versionv3.0.0
Downloads1,323
Installs3
TERMINAL
clawhub install crawler

📖 About This Skill


name: "crawler" version: "3.0.0" description: "Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations" author: "BytesAgain" homepage: "https://bytesagain.com" source: "https://github.com/bytesagain/ai-skills" tags: [web-scraping, scrapy, crawler, robots-txt, selenium] category: "devtools"

Crawler

Web crawling and scraping reference — robots.txt protocol, Scrapy framework, anti-bot detection, headless browsers, and legal considerations. No API keys or credentials required — outputs reference documentation only.

Commands

| Command | Description | |---------|-------------| | intro | Crawling vs scraping, robots.txt, sitemap | | standards | HTTP caching, structured data, meta tags | | troubleshooting | Anti-bot detection, JS rendering, encoding | | performance | Concurrency, dedup, incremental, distributed | | security | Legal landscape, ethical guidelines, proxies | | migration | BeautifulSoup to Scrapy, requests to Playwright | | cheatsheet | Scrapy commands, CSS/XPath, curl, user-agents | | faq | Legality, JS pages, blocking, storage |

Output Format

All commands output plain-text reference documentation via heredoc. No external API calls, no credentials needed, no network access.


*Powered by BytesAgain | bytesagain.com | hello@bytesagain.com*