🎁 Get the FREE AI Skills Starter GuideSubscribe →
BytesAgainBytesAgain
🦀 ClawHub

Llm Eval Router

by @nissan

Shadow-test local Ollama models against a cloud baseline with a multi-judge ensemble. Automatically promotes models when statistically proven equivalent — re...

Versionv1.2.2
Installs2
When to Use
TriggerAction
- You have Ollama running locally with capable models (qwen2.5, phi4, mistral, etc.)
- You want evidence-based cost reduction, not blind routing
- You have defined task types: summarize, classify, extract, format, analyze, RAG
⚙️ Configuration

  • Ollama installed and running (ollama.com)
  • At least one capable model: ollama pull qwen2.5 or ollama pull phi4
  • Python 3.10+
  • API keys: Anthropic (ground truth) + OpenAI (judge) — Gemini optional (tiebreaker)
  • Langfuse for observability (self-hosted or cloud) — optional but strongly recommended
  • View on ClawHub
    TERMINAL
    clawhub install llm-eval-router

    🧪 Use this skill with your agent

    Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

    🔍 Can't find the right skill?

    Search 60,000+ AI agent skills — free, no login needed.

    Search Skills →