Find the Right AI Skill for Any Job
Browse 28+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.
All Skills — data
28 skills in "data" matching "extraction"
🌐 Allcodingdevopsapidatabasesecuritydataresearchwritingimage-genvideoaudiotranslationseosocial-mediaemail-marketingadvertisingfinancecrypto-defiecommercelegalhrreal-estatehealtheducationcookingtravelgamingautomationcommunicationproductivityclawhublobehubdifymcp
🦀 ClawHub
MinerU PDF Parser Clawdbot Skill
Parse PDFs locally (CPU) into Markdown/JSON using MinerU. Assumes MinerU creates per‑doc output folders; supports table/image extraction.
🦀 ClawHub
Online Analysis
Online (real-time) data analysis, rule extraction, and pattern recognition for testing scenarios. Activate when user mentions test online analysis, real-time...
🦀 ClawHub
Glancify
A proxy service that wraps external web pages with Glancify, enabling interactive content visualization and keyword extraction.
🦀 ClawHub
Pdf Parser Agent
Parses local PDF files into structured Markdown and JSON using opendataloader-pdf for deterministic, local document content extraction.
🦀 ClawHub
Pdf Intelligence Suite
PDF智能处理套件 - 文本提取、表格识别、OCR、PDF转Word/Excel等 | PDF Intelligence Suite - Text extraction, table recognition, OCR, PDF to Word/Excel conversion
🦀 ClawHub
Web Scraper Seller
Build custom web scrapers for websites, offering single-page, multi-page, and real-time data extraction with CSV/JSON export and API delivery options.
🦀 ClawHub
Book Capture Obsidian
Capture and normalize book metadata into Obsidian Markdown notes from photos or Goodreads CSV exports. Use for barcode and OCR ISBN extraction, metadata enri...
🦀 ClawHub
Nanonets OCR
Document extraction API by Nanonets. Convert PDFs and images to markdown, JSON, or CSV with confidence scoring. Use when you need to OCR documents, extract invoice fields, parse receipts, or convert tables to structured data.
🦀 ClawHub
Tabstack Extractor
Extract structured data from websites using Tabstack API. Use when you need to scrape job listings, news articles, product pages, or any structured web content. Provides JSON schema-based extraction and clean markdown conversion. Requires TABSTACK_API_KEY environment variable.
🦀 ClawHub
wechat-article-extraction-mp-weixin-qq-com news-webpage-cleaning blog-post-parsing metadata-extraction-title-author-date multiple-output-formats-markdown-json-plain-text batch-processing-support
基于三引擎设计,从微信文章、新闻和博客网页提取干净内容,支持标题作者日期元数据,多格式和批量处理。
🦀 ClawHub
XCrawl Scrape
Use this skill for XCrawl scrape tasks, including single-URL fetch, format selection, sync or async execution, and JSON extraction with prompt or json_schema.
🦀 ClawHub
Data Spider
Scrape any webpage and extract structured data as JSON, table, or list. Supports schema-guided extraction.
🦀 ClawHub
S.H.I.T底刊摘要
Automates extraction and AI-based analysis of research papers from shitjournal.org, capturing titles, abstracts, DOIs, and publication dates in JSON format.
🦀 ClawHub
mineru precision extract PDF、Document、Images
Precision document extraction with full feature set — table recognition, formula recognition, OCR, multi-format output (Markdown, HTML, LaTeX, DOCX, JSON), b...
🦀 ClawHub
LiteParse Document Parser
Use when parsing PDFs, DOCX, PPTX, XLSX, or images locally. Supports text extraction, JSON output with bounding boxes, batch processing, and page screenshots...
🦀 ClawHub
PulpMiner Web Scraper - Convert Any Webpage to Realtime JSON API
Convert any webpage into structured JSON data using AI. Scrape websites, extract data into custom JSON schemas, and call saved APIs programmatically. Useful for web scraping, data extraction, content monitoring, lead generation, price tracking, and building data pipelines.
🦀 ClawHub
file-processor
Automatically detects and processes files including PDF, Excel, CSV, Word, images, and text for extraction, OCR, data analysis, and summarization.
🦀 ClawHub
Automated daily memory backfill for OpenClaw sessions
Scrape and analyze OpenClaw JSONL session logs to reconstruct and backfill agent memory files. Use when: (1) Memory appears incomplete after model switches, (2) Verifying memory coverage, (3) Reconstructing lost memory, (4) Automated daily memory sync via cron/heartbeat. Supports simple extraction and LLM-based narrative summaries with automatic secret sanitization.
🦀 ClawHub
Endpoints
Endpoints document management API toolkit. Scan documents with AI extraction and organize structured data into categorized endpoints. Use when the user asks to: scan a document, upload a file, list endpoints, inspect endpoint data, check usage stats, create or delete endpoints, get file URLs, or manage document metadata. Requires ENDPOINTS_API_KEY from endpoints.work dashboard.
🔧 Dify
Mineru (Dify)
MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU is a document parser that can parse complex document data for any downstream LLM use case (RAG, agents) [GitHub - opendatalab/MinerU: A high-quality tool for convert PDF to Markdown and JSON.](https://github.com/opendatalab/MinerU) - Remove headers, footers
⭐ GitHub⭐ 120
udayanwalvekar/clearshot
--- name: clearshot description: "Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design
🦀 ClawHub
Apify Ultimate Scraper
Universal AI-powered web scraper for any platform. Scrape data from Instagram, Facebook, TikTok, YouTube, Google Maps, Google Search, Google Trends, Booking.com, and TripAdvisor. Use for lead generation, brand monitoring, competitor analysis, influencer discovery, trend research, content analytics, audience analysis, or any data extraction task.
🦀 ClawHub
UniFuncs Reader
Use UniFuncs Reader API to read web pages and documents such as PDF and Word and Excel and PPTX URL, with AI-powered content extraction. Use this skill when...
⭐ GitHub
Cortex Memory
A complete solution for agent memory, from extraction and vector search to automated optimization, and insights dashboard out-of-the-box.
🦀 ClawHub
Swipenode
Lightning-fast web extraction for AI agents. Extracts structured JSON from Next.js, Nuxt.js, Gatsby, Remix without headless browsers. TLS spoofing bypasses C...
🦀 ClawHub
Parallel AI search
Use Parallel's parallel-cli to do live web search, URL extraction (clean markdown), deep research reports, bulk data enrichment (CSV/JSON), FindAll entity di...
🦀 ClawHub
OpenDataLoader PDF Parser (乌贼版)
PDF parsing tool for AI/RAG. Convert PDF to Markdown, JSON, HTML with layout preservation, bounding boxes, and image extraction. Use when you need to extract...
🦀 ClawHub
Chatgpt Memory Extraction
Extract structured personal memories from ChatGPT export data (conversations JSON). Produces organized timeline, people profiles, and thematic records by dee...