π¦ ClawHub
Agent Scorecard
by @theshadowrose
Configurable quality evaluation for AI agent outputs. Define criteria, run evaluations, track quality over time. No LLM-as-judge, no API calls, pattern-based...
β‘ When to Use
π‘ Examples
# 1. Configure
cp config_example.json scorecard_config.json
Edit dimensions, thresholds, and weights for your use case
2. Evaluate a response
python3 scorecard.py --config scorecard_config.json --input response.txt3. Evaluate and save to history
python3 scorecard.py --config scorecard_config.json --input response.txt --save history.jsonl4. Manual scoring mode
python3 scorecard.py --config scorecard_config.json --input response.txt --manual --save history.jsonl5. View trends
python3 scorecard_track.py --history history.jsonl --summary6. Compare before/after (last 10 vs previous 10)
python3 scorecard_track.py --history history.jsonl --compare 107. Generate a report
python3 scorecard_report.py --config scorecard_config.json --history history.jsonl
βοΈ Configuration
See config_example.json for the complete reference. Key areas:
DIMENSIONS β Quality dimensions with rubrics, weights, thresholds, and auto-checksAUTO_CHECKS β Tuning for each pattern-based check (markers, thresholds, penalties)AGGREGATE_METHOD β How to combine dimension scores ("weighted_average", "minimum", "geometric_mean")HISTORY_FILE β Where to store evaluation historyREPORT_OUTPUT_DIR β Where reports are savedTERMINAL
clawhub install agent-scorecard