Multi-Dim Eval Framework Designer by @tatsuko-tsukimi
Designs a multi-dimensional evaluation framework for AI systems where single-score benchmarks lose information. Use when comparing experiments/agents across...
β‘ When to Use
Trigger Action - Wants to evaluate AI systems (agents, deliberations, RAG, multi-step reasoning) across multiple qualitatively-different dimensions - Needs to compare instances with asymmetric data availability (some have canonical metrics, others have only narrative logs) - Has noticed single-score benchmarks miss important variation between systems - Says "tradeoffs" β and wants to make those tradeoffs explicit per dimension - Wants a reusable scorecard format that survives infrastructure migrations Don't activate when: - The user wants a single comparable benchmark number β point them at HumanEval / MMLU / domain-specific benchmarks instead - The system has a clear single quality metric (perplexity, accuracy on a labeled set) - The user is asking how to design *one* metric, not a *framework* of metrics
βΈ Show full description clawhub install multi-dim-eval-frameworkCopy
π§ͺ Use this skill with your agent Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.
π Can't find the right skill?
Search 60,000+ AI agent skills β free, no login needed.
Search Skills β