π¦ ClawHub
Semantic Consistency Auditor
by @aipoch-ai
Use semantic consistency auditor for academic writing workflows that need structured execution, explicit assumptions, and clear output boundaries.
β‘ When to Use
π‘ Examples
Command Line
Evaluate single case pair
python scripts/main.py \
--ai-generated "Patient presented with fever for 3 days, highest temperature 39Β°C, accompanied by cough." \
--gold-standard "Patient chief complaint of fever for 3 days, highest temperature 39Β°C, accompanied by cough symptoms." \
--output results.jsonBatch evaluation from JSON file
python scripts/main.py \
--input-file batch_cases.json \
--output results.json \
--format detailedUse specific model
python scripts/main.py \
--ai-generated "..." \
--gold-standard "..." \
--bert-model "bert-base-chinese" \
--comet-model "Unbabel/wmt20-comet-da"
Python API
from semantic_consistency_auditor import SemanticConsistencyAuditorInitialize evaluator
auditor = SemanticConsistencyAuditor(
bert_model="microsoft/deberta-xlarge-mnli",
comet_model="Unbabel/wmt22-comet-da",
lang="zh"
)Evaluate single case
result = auditor.evaluate(
ai_text="Patient presented with fever for 3 days...",
gold_text="Patient chief complaint of fever for 3 days..."
)print(f"BERTScore F1: {result['bertscore']['f1']:.4f}")
print(f"COMET Score: {result['comet']['score']:.4f}")
print(f"Consistency: {result['consistency']:.4f}")
print(f"Passed: {result['passed']}")
Batch evaluation
results = auditor.evaluate_batch([
{"ai": "...", "gold": "..."},
{"ai": "...", "gold": "..."}
])
βοΈ Configuration
Configure in ~/.openclaw/skills/semantic-consistency-auditor/config.yaml:
BERTScore Configuration
bertscore:
model: "microsoft/deberta-xlarge-mnli" # Or "bert-base-chinese" for Chinese
lang: "zh" # Language code: zh, en, etc.
rescale_with_baseline: true
device: "auto" # auto, cpu, cudaCOMET Configuration
comet:
model: "Unbabel/wmt22-comet-da" # COMET model
batch_size: 8
device: "auto"Evaluation Thresholds
thresholds:
bertscore_f1: 0.85
comet_score: 0.75
semantic_consistency: 0.80 # Comprehensive score threshold
TERMINAL
clawhub install semantic-consistency-auditor