BytesAgain is a curated directory of 60,000+ AI agent skills from ClawHub, GitHub, LobeHub, and Dify. Search skills by keyword in 7 languages, browse by role (developer, creator, trader, marketer) or by use case.

How do I find AI skills on BytesAgain?

Use the search bar on BytesAgain.com to search by keyword in 7 languages. You can also browse by role (developer, creator, trader, marketer) or by use case. Each skill shows install instructions for Claude, Cursor, OpenClaw, Continue, and more.

Yes, BytesAgain is completely free. No registration required for searching skills. The MCP API is also free with rate limits.

Does BytesAgain have an API for AI agents?

Yes! BytesAgain provides a free MCP SSE endpoint at /api/mcp/sse for AI agents, plus a REST API at /api/mcp?action=search&q= . No authentication needed.

Can I request a new AI skill on BytesAgain?

Yes! Visit the Requests page on BytesAgain.com to submit a skill request. Your request will be visible to the community and notified to the site admin.

🦀 ClawHub

pdf-miner

by @baichenwzj

Extract text and tables from PDF files with robust support for global market data formats (currencies, percentages, units). Use when: (1) User asks to read/e...

Versionv1.0.2

#clawhub

💡 Examples

Run commands from this skill directory.

Basic Extraction

# Full extraction (text + tables)
python scripts/extract_pdf.py input.pdf
Output to custom path
python scripts/extract_pdf.py input.pdf output.md
Specific pages
python scripts/extract_pdf.py input.pdf --pages 1-5,10,15-20
Text or tables only
python scripts/extract_pdf.py input.pdf --text-only
python scripts/extract_pdf.py input.pdf --tables-only
python scripts/extract_pdf.py input.pdf --tables-only --json

Advanced Modes

# Search: find pages containing keywords with context
python scripts/extract_pdf.py report.pdf --search "Vietnam export penetration"
Metrics: extract lines with keywords + numeric values
python scripts/extract_pdf.py report.pdf --metrics "market size growth export penetration"
TOC: extract table of contents / chapter structure (robust, multi-format)
python scripts/extract_pdf.py report.pdf --toc
Optionally adjust sensitivity (default: 3 entries per page required)
python scripts/extract_pdf.py report.pdf --toc --toc-min-entries 2
Diff: compare two PDFs, show pages unique to each
python scripts/extract_pdf.py old_version.pdf new_version.pdf --diff
Chunk: split output into LLM-friendly chunks
python scripts/extract_pdf.py report.pdf --chunk             # single file, 8000 chars each
python scripts/extract_pdf.py report.pdf --chunk --max-chars 4000
python scripts/extract_pdf.py report.pdf --chunk --output-dir ./chunks   # separate files
Clean headers/footers
python scripts/extract_pdf.py report.pdf --clean-headers
Batch: process multiple PDFs
python scripts/extract_pdf.py file1.pdf file2.pdf file3.pdf --output-dir ./extracted

OCR for Scanned/Image PDFs (Automatic by Default)

OCR is automatically triggered for pages with very little extractable text (default threshold: 100 characters). This helps handle scanned or image-based PDFs without requiring the --ocr flag.

#### Usage Examples

# Automatic OCR (default behavior)
python scripts/extract_pdf.py scanned.pdf
Force OCR on all pages (ignore text length)
python scripts/extract_pdf.py scanned.pdf --ocr
Force OCR only on specific pages
python scripts/extract_pdf.py scanned.pdf --ocr --ocr-pages 1-5,10
Adjust OCR quality (DPI)
python scripts/extract_pdf.py scanned.pdf --ocr --ocr-dpi 300
Use a different vision model
python scripts/extract_pdf.py scanned.pdf --ocr --ocr-model "stepfun/step-3.5-flash:free"
Disable automatic OCR detection (if you want pure extraction only)
python scripts/extract_pdf.py file.pdf --no-auto-ocr
Change the low-text threshold (default 100 chars)
python scripts/extract_pdf.py file.pdf --ocr-threshold 200

#### Configuration

OCR requires a vision API key. See Initial Setup for OCR.

| Option | Default | Description | |--------|---------|-------------| | --ocr | off | Force OCR on pages (with auto-detect or --ocr-pages) | | --auto-ocr | on | Automatically OCR low-text pages (hidden; use --no-auto-ocr to disable) | | --no-auto-ocr | - | Disable automatic OCR detection | | --ocr-pages | - | Comma-separated pages/ranges to OCR (requires --ocr) | | --ocr-threshold | 100 | Minimum text length to consider a page as "sufficient" (characters) | | --ocr-dpi | 200 | Image DPI for OCR rendering | | --ocr-api-key | from env/config | Override API key | | --ocr-base-url | from env/config | Override API base URL | | --ocr-model | from env/config | Override vision model |

#### Troubleshooting

OCR failed with "No API key" → Configure your API key in config.json or via OCR_API_KEY env var.

OCR model rejects images → The configured model might not support vision. Choose a vision-capable model (e.g., qwen/qwen3.6-plus:free, stepfun/step-3.5-flash:free). The script will attempt to auto-fallback to a known good model if the configured one lacks vision support.

Too many pages being OCR'd → Increase the threshold: --ocr-threshold 300 or --no-auto-ocr and selectively use --ocr-pages.

Rate limit errors → Reduce concurrent OCR calls, switch to a paid model tier, or try a different provider.

⚙️ Configuration

OCR requires a vision API key. See Initial Setup for OCR.

#### Troubleshooting

OCR failed with "No API key" → Configure your API key in config.json or via OCR_API_KEY env var.

Too many pages being OCR'd → Increase the threshold: --ocr-threshold 300 or --no-auto-ocr and selectively use --ocr-pages.

Rate limit errors → Reduce concurrent OCR calls, switch to a paid model tier, or try a different provider.

📋 Tips & Best Practices

OCR failed with "No API key" → Configure your API key in config.json or via OCR_API_KEY env var.

Too many pages being OCR'd → Increase the threshold: --ocr-threshold 300 or --no-auto-ocr and selectively use --ocr-pages.

Rate limit errors → Reduce concurrent OCR calls, switch to a paid model tier, or try a different provider.

View on ClawHub

TERMINAL

clawhub install pdf-miner

🧪 Use this skill with your agent

Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

Manusinvite

Task-oriented agent. Great for testing AI skills end-to-end.

Try Manus →

OpenClaw

Local-first agent. Install skills via ClawHub CLI.

Set up OpenClaw →

Claude Code

Anthropic's coding agent. Paste the prompt or SKILL.md into your session.

Claude Code docs →

Cursor

AI-powered IDE. Use the smoke-test prompt in Cursor Agent.

Open Cursor →

Continue.dev

Open-source AI code assistant. Add SKILL.md as a custom tool.

Continue docs →

Windsurf

Agentic IDE by Codeium. Paste the prompt into Cascade.

Try Windsurf →

Cline

VS Code extension for autonomous coding with MCP tools.

Cline on GitHub →

Copilot Workspace

GitHub's AI dev environment. Suitable for code-generation skills.

Copilot Workspace →