π¦ ClawHub
Hwp Extract Pipeline
by @heoboong
HWP/HWPX/PDF extraction pipeline: attempt hwp-reader, then pyhwp, then OCR, with safe fallbacks. Use when agent needs reliable text extraction from Korean HW...
TERMINAL
clawhub install hwp-extract-pipelineπ About This Skill
name: hwp-extract-pipeline description: "HWP/HWPX/PDF extraction pipeline: attempt hwp-reader, then pyhwp, then OCR, with safe fallbacks. Use when agent needs reliable text extraction from Korean HWP/HWPX or PDF/scan attachments."
hwp-extract-pipeline
κ°λ¨ν HWP/HWPX/PDF μΆμΆ νμ΄νλΌμΈ μ€ν¬μ λλ€. ν΅μ¬ λͺ©νλ λ‘컬μ μ μ₯λ κ³΅κ³ λ¬Έ(νκΈ νμΌ)μ μμ μ μΌλ‘ ν μ€νΈλ‘ λ³νν΄ JSON νμμΌλ‘ λ°ννλ κ²μ λλ€.
κ°λ¨ μ¬μ©λ²
μ°μ μμ(ν΄λ°± λ°©μ) 1. hwp-reader νΈμΆ (μΈλΆ skill νΈμΆ κ°λ₯μ) 2. pyhwp(venv) κΈ°λ° μΆμΆ 3. μμ€ν OCR (poppler + tesseract) β μμ€ν μ€μΉ νμν μ μμ 4. strings κΈ°λ° ν΄λ°±
μ°Έκ³ λ¬Έμ