safety-kb-import
by @cyz9827
安全生产法规标准导入工具。当用户需要导入新法规或标准到知识库、PDF文本提取、条款拆分、批量导入、数据质量验证时使用。触发词:导入法规、添加标准、入库、导入知识库、补充标准、PDF提取文本、拆分条款、KB导入、safety-review import
1. Detect Python command:
python --version
2. Required packages for PDF extraction:
pip install pdfplumber
For OCR of scanned PDFs:
pip install pdf2image pytesseract
# Also requires Tesseract OCR engine installed on system
| Error | Cause | Solution |
|-------|-------|----------|
| Database not found | Wrong path | Set KB_PATH env var or update DEFAULT_DB_PATH |
| no such column: X | Schema changed | Run schema command to verify columns |
| UNIQUE constraint failed | Duplicate insert attempt | Tool should handle updates; check manifest has unique doc numbers |
| clause_count: 0 after import | Text empty or pattern mismatched | Try different clause_split_pattern; verify full_text field isn't empty |
| Garbled Chinese in output | Encoding issue | Ensure script runs with UTF-8 locale; Windows: chcp 65001 |
clawhub install safety-kb-import