Find the Right AI Skill for Any Job

Browse 338+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

All Skills

338 skills total matching "extraction"

Use when parsing PDFs, DOCX, PPTX, XLSX, or images locally. Supports text extraction, JSON output with bounding boxes, batch processing, and page screenshots...

🦀 ClawHub

Scan To Markdown

OCR document extraction - extract text from scanned documents, photos, and images using OCR. Use when reading scanned PDFs, photographed pages, handwritten n...

🦀 ClawHub

InfoQuest Web Search

AI-optimized web search, image search and content extraction via BytePlus InfoQuest API. Use this skill when you need to gather concise and up-to-date inform...

🦀 ClawHub

PulpMiner Web Scraper - Convert Any Webpage to Realtime JSON API

Convert any webpage into structured JSON data using AI. Scrape websites, extract data into custom JSON schemas, and call saved APIs programmatically. Useful for web scraping, data extraction, content monitoring, lead generation, price tracking, and building data pipelines.

🦀 ClawHub

Akashic Doc Analyzer

Parse, analyze, and extract content from documents (PDF, DOCX, PPTX, audio). Supports OCR, table extraction, and semantic chunking.

🦀 ClawHub

Invoice Scan

AI-powered invoice OCR, scanning, and data extraction. Use when: (1) user needs OCR or text extraction from invoice images, scanned documents, or PDFs, (2) s...

🦀 ClawHub

MinerU zero-setup document extraction — convert PDFs, images, Word, and PowerPoint to Markdown instantly. No login, no token, no configuration. Just run and get results

Zero-setup document extraction — convert PDFs, images, Word, and PowerPoint to Markdown. No login, no token, no configuration. Just run and get results.

🦀 ClawHub

Power Search

Self-hosted research tool combining Brave Search API + Browserless content fetching. Search the web with optional full-page content extraction and HTML parsing.

🦀 ClawHub

file-processor

Automatically detects and processes files including PDF, Excel, CSV, Word, images, and text for extraction, OCR, data analysis, and summarization.

🦀 ClawHub

Deep Research Pro v5.0.1

Performs deep research using a three-stage process: data extraction, thematic insight briefs with contradiction analysis, and narrative-driven strategic repo...

🦀 ClawHub

Browser Web Extract

通过URL提取任意网页的文本和图片内容。当用户提供URL、网页链接或网址，并希望读取、提取、抓取、摘要、分析页面内容、web page extraction, URL scraping, web content reading, website data collection, link parsing, web...

🦀 ClawHub

Web Researcher Mini

Firecrawl CLI for web scraping, crawling, and search. Scrape single pages or entire websites, map site URLs, and search the web with full content extraction....

🦀 ClawHub

maasv Memory

Provides structured long-term memory with semantic, keyword, and knowledge graph retrieval, entity extraction, temporal versioning, and experiential learning.

🦀 ClawHub

Lightpanda Scraper

Fast headless browser web scraping using Lightpanda (0.5s page loads, 90x faster than Chromium). Perfect for OSINT recon, link extraction, and content scrapi...

🦀 ClawHub

Fast Browser Use Local

Rust-based browser automation using local Chrome for ultra-fast DOM extraction, session management, screenshots, scraping, and site structure analysis.

🦀 ClawHub

Ride Receipts

Build a local SQLite ride-history database from Gmail ride receipt emails using gog for fetch and OpenClaw Gateway /v1/responses for extraction. Use when you...

🦀 ClawHub

Video Transcript

Extract full transcripts from video content for analysis, summarization, note-taking, or research. Use when the user wants a written version of video content, asks to "transcribe this", "get the text from this video", "convert video to text", or shares a video URL for content extraction.

🦀 ClawHub

StartClaw-Optimizer

Master optimization system - APPLIES TO EVERY RESPONSE. Before responding, classify task complexity (simple question vs analysis vs coding). Use Haiku for simple/navigation/extraction/status. Use Sonnet ONLY for writing/analysis/planning/debugging. Monitor context size - if >50k tokens, recommend /compact. For automations, use scheduler wrapper. Never load full conversation history for simple tasks. Heartbeats always Haiku, single-line only. Never use Opus. This skill MUST run before every respo

🦀 ClawHub

Highlight Reels

Scenario-focused Sparki skill for highlight extraction while using the latest official Sparki setup, API-key, and upload workflow guidance.

🦀 ClawHub

nanobanana2-apiyi

Generate images via APIYI (Gemini 3.1 Flash Image Preview). Use when user wants to generate images from text descriptions. Supports keyword extraction, promp...

🦀 ClawHub

WiseOCR

PDF & Image OCR — Convert a single PDF or image to Markdown via WiseDiag cloud API, with high-accuracy text extraction, table recognition, and multi-column l...

🦀 ClawHub

Web Scraping & Data Extraction Engine

Complete web scraping methodology — legal compliance, architecture design, anti-detection, data pipelines, and production operations. Use when building scrap...

🦀 ClawHub

ucloud-deepseek-ocr

OCR text recognition using DeepSeek-OCR model. Use when user asks for OCR, text recognition, image text extraction, screenshot recognition, or converting ima...

🦀 ClawHub

Automated daily memory backfill for OpenClaw sessions

Scrape and analyze OpenClaw JSONL session logs to reconstruct and backfill agent memory files. Use when: (1) Memory appears incomplete after model switches, (2) Verifying memory coverage, (3) Reconstructing lost memory, (4) Automated daily memory sync via cron/heartbeat. Supports simple extraction and LLM-based narrative summaries with automatic secret sanitization.

🦀 ClawHub

MiniMax PDF Analysis V2

Analyze PDF files using MiniMax API. Supports text extraction, keyword search, and image-based VLM analysis (converts PDF pages to images first). Requires Mi...

🦀 ClawHub

ClawMemory

Sovereign agent memory engine — self-hosted, privacy-first SQLite store with LLM-based fact extraction (GLM-4.7), hybrid BM25+vector search, contradiction re...

🔧 Dify

Firecrawl (Dify)

**Firecrawl** is a powerful API integration for web crawling and data scraping. It allows users to extract URLs, scrape website content, and retrieve structured data from web pages. With its modular tools, Firecrawl simplifies the process of gathering web data efficiently. You can now use it in your application workflows for automated web data extraction and analysis. To set up Firecrawl, follow t

🦀 ClawHub

Tavily Skill.Bak

Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the...

🦀 ClawHub

Lark/Feishu Sheets & Cloud File Download (with PDF extraction)

Read, write and manage Lark/Feishu Sheets (spreadsheets) and download Lark/Feishu cloud files via Lark OpenAPI. Reads Feishu app credentials (appId/appSecret...

🦀 ClawHub

math-guide-solver

Complete mathematical problem solving workflow with OCR, LaTeX formula extraction, PNG rendering, and guided solutions. Use this skill when users want to: -...

🦀 ClawHub

frompdf

PDF extraction API for AI agents and LLM pipelines. Converts any PDF into semantic AST, markdown, HTML, plain text, or LLM-ready chunks — no page limit. Also...

🦀 ClawHub

local_memory

Manage AI conversation memory locally with automatic extraction, retrieval, and manual commands, ensuring privacy without external APIs or fees.

🦀 ClawHub

Ocr Document

OCR document extraction - extract text from scanned documents, photos, and images using OCR. Use when reading scanned PDFs, photographed pages, handwritten n...

🦀 ClawHub

X (Twitter) Data Scraper

X (Twitter) data extraction and analysis. Use when user asks to "get tweets from @username", "search X for", "analyze Twitter data", "fetch tweets about [top...

🦀 ClawHub

Endpoints

Endpoints document management API toolkit. Scan documents with AI extraction and organize structured data into categorized endpoints. Use when the user asks to: scan a document, upload a file, list endpoints, inspect endpoint data, check usage stats, create or delete endpoints, get file URLs, or manage document metadata. Requires ENDPOINTS_API_KEY from endpoints.work dashboard.

🦀 ClawHub

CorpusGraph Document ETL and entity relationship engine for AI agents

Document ETL, entity extraction, and relationship graphing engine. Convert 1,000+ file formats into searchable, structured data with automatic entity and rel...

⭐ GitHub

OrangeViolin/skill-evolve

演进式 Skill 改进 — A Claude Code Skill that improves other skills through observation, pattern extraction, and iterative refinement. Based on OTF + JIT + Bootstrap methodology.

🦀 ClawHub

markdown-extract

Extract clean markdown from any URL using auto, AI, or browser methods via the markdown.new API with error handling and flexible extraction options.

🔧 Dify

Paddleocr (Dify)

**[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) is an industry-leading, production-ready OCR and document AI engine, offering end-to-end solutions from text extraction to intelligent document understanding.** This plugin provides several capabilities from PaddleOCR, including text recognition, document parsing, and more. Open the Plugin Marketplace, search for the PaddleOCR plugin, and in

🔧 Dify

Mineru (Dify)

MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU is a document parser that can parse complex document data for any downstream LLM use case (RAG, agents) [GitHub - opendatalab/MinerU: A high-quality tool for convert PDF to Markdown and JSON.](https://github.com/opendatalab/MinerU) - Remove headers, footers

⭐ GitHub⭐ 120

udayanwalvekar/clearshot

--- name: clearshot description: "Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design

🦀 ClawHub

Brave Search Old

Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.

🦀 ClawHub

Ingestigate Investigative intelligence for AI agents

Investigative intelligence — document search, entity extraction, and relationship graphing. Analyze document corpuses to find connections between people, org...

🦀 ClawHub

document-parser

Extract structured data from PDFs, images, and Word files with layout analysis, table recognition, OCR, seal detection, and directory extraction.

🦀 ClawHub

Smart Memory (Zero Dep)

Enhanced memory system for agentic workflows. Automatic memory extraction from conversations, memory type classification (preference/project/technical/lesson...

🦀 ClawHub

Meta Video Ad Analyzer

Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.

🦀 ClawHub

Pget

Parallel file download and optional tar extraction using the pget CLI (single URL or multifile manifest). Use when you need high‑throughput downloads from HTTP(S)/S3/GCS, want to split a large file into chunks for speed, or want to download and extract a .tar/.tar.gz in one step.

🦀 ClawHub

AnyCrawl-API

Perform high-performance web scraping, crawling, and Google search with multi-engine support and structured data extraction via AnyCrawl API.

← PrevPage 5 / 8 (338 skills)Next →