๐ŸŽ Get the FREE AI Skills Starter Guide โ€” Subscribe โ†’
BytesAgainBytesAgain
๐Ÿฆ€ ClawHub

MinerU PDF Extractor

by @a-i-r

Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.

Versionv1.0.5
Installs2
๐Ÿ’ก Examples

cd scripts/

Step 1: Apply for upload URL

./local_file_step1_apply_upload_url.sh /path/to/your.pdf

Output: BATCH_ID=xxx UPLOAD_URL=xxx

Step 2: Upload file

./local_file_step2_upload_file.sh "$UPLOAD_URL" /path/to/your.pdf

Step 3: Poll for results

./local_file_step3_poll_result.sh "$BATCH_ID"

Output: FULL_ZIP_URL=xxx

Step 4: Download results

./local_file_step4_download.sh "$FULL_ZIP_URL" result.zip extracted/

Script Descriptions

#### local_file_step1_apply_upload_url.sh

Apply for upload URL and batch_id.

Usage:

./local_file_step1_apply_upload_url.sh  [language] [layout_model]

Parameters:

  • language: ch (Chinese), en (English), auto (auto-detect), default ch
  • layout_model: doclayout_yolo (fast), layoutlmv3 (accurate), default doclayout_yolo
  • Output:

    BATCH_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    UPLOAD_URL=https://mineru.oss-cn-shanghai.aliyuncs.com/...
    


    #### local_file_step2_upload_file.sh

    Upload PDF file to the presigned URL.

    Usage:

    ./local_file_step2_upload_file.sh  
    


    #### local_file_step3_poll_result.sh

    Poll extraction results until completion or failure.

    Usage:

    ./local_file_step3_poll_result.sh  [max_retries] [retry_interval_seconds]
    

    Output:

    FULL_ZIP_URL=https://cdn-mineru.openxlab.org.cn/pdf/.../xxx.zip
    


    #### local_file_step4_download.sh

    Download result ZIP and extract.

    Usage:

    ./local_file_step4_download.sh  [output_zip_filename] [extract_directory_name]
    

    Output Structure:

    extracted/
    โ”œโ”€โ”€ full.md              # ๐Ÿ“„ Markdown document (main result)
    โ”œโ”€โ”€ images/              # ๐Ÿ–ผ๏ธ Extracted images
    โ”œโ”€โ”€ content_list.json    # Structured content
    โ””โ”€โ”€ layout.json          # Layout analysis data
    

    Detailed Documentation

    ๐Ÿ“š Complete Guide: See docs/Local_File_Parsing_Guide.md


    View on ClawHub
    TERMINAL
    clawhub install mineru-pdf-extractor

    ๐Ÿงช Use this skill with your agent

    Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

    ๐Ÿ” Can't find the right skill?

    Search 60,000+ AI agent skills โ€” free, no login needed.

    Search Skills โ†’