Llm Eval Router by @nissan
Shadow-test local Ollama models against a cloud baseline with a multi-judge ensemble. Automatically promotes models when statistically proven equivalent — re...
⚡ When to Use
Trigger Action - You have Ollama running locally with capable models (qwen2.5, phi4, mistral, etc.) - You want evidence-based cost reduction, not blind routing - You have defined task types: summarize, classify, extract, format, analyze, RAG
⚙️ Configuration
Ollama installed and running (ollama.com)
At least one capable model: ollama pull qwen2.5 or ollama pull phi4
Python 3.10+
API keys: Anthropic (ground truth) + OpenAI (judge) — Gemini optional (tiebreaker)
Langfuse for observability (self-hosted or cloud) — optional but strongly recommended ▸ Show full description clawhub install llm-eval-routerCopy
🧪 Use this skill with your agent Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.
🔍 Can't find the right skill?
Search 60,000+ AI agent skills — free, no login needed.
Search Skills →