🎁 Get the FREE AI Skills Starter Guide β€” Subscribe β†’
BytesAgainBytesAgain
πŸ¦€ ClawHub

Agent Scorecard

by @theshadowrose

Configurable quality evaluation for AI agent outputs. Define criteria, run evaluations, track quality over time. No LLM-as-judge, no API calls, pattern-based...

Versionv1.0.6
⚑ When to Use
TriggerAction
- **Model comparison:** Same task, different models β€” which scores higher?
- **Agent regression testing:** Catch quality degradation before it ships
- **Team quality standards:** Define shared rubrics for consistent evaluation
- **Continuous monitoring:** Track quality trends over days/weeks/months
- **A/B testing:** Quantified before/after comparisons
πŸ’‘ Examples

# 1. Configure
cp config_example.json scorecard_config.json

Edit dimensions, thresholds, and weights for your use case

2. Evaluate a response

python3 scorecard.py --config scorecard_config.json --input response.txt

3. Evaluate and save to history

python3 scorecard.py --config scorecard_config.json --input response.txt --save history.jsonl

4. Manual scoring mode

python3 scorecard.py --config scorecard_config.json --input response.txt --manual --save history.jsonl

5. View trends

python3 scorecard_track.py --history history.jsonl --summary

6. Compare before/after (last 10 vs previous 10)

python3 scorecard_track.py --history history.jsonl --compare 10

7. Generate a report

python3 scorecard_report.py --config scorecard_config.json --history history.jsonl

βš™οΈ Configuration

See config_example.json for the complete reference. Key areas:

  • DIMENSIONS β€” Quality dimensions with rubrics, weights, thresholds, and auto-checks
  • AUTO_CHECKS β€” Tuning for each pattern-based check (markers, thresholds, penalties)
  • AGGREGATE_METHOD β€” How to combine dimension scores ("weighted_average", "minimum", "geometric_mean")
  • HISTORY_FILE β€” Where to store evaluation history
  • REPORT_OUTPUT_DIR β€” Where reports are saved

  • View on ClawHub
    TERMINAL
    clawhub install agent-scorecard

    πŸ§ͺ Use this skill with your agent

    Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

    πŸ” Can't find the right skill?

    Search 60,000+ AI agent skills β€” free, no login needed.

    Search Skills β†’