Llmrouter
by @alexrudloff
Intelligent LLM proxy that routes requests to appropriate models based on complexity. Save money by using cheaper models for simple tasks. Tested with Anthropic, OpenAI, Gemini, Kimi/Moonshot, and Ollama.
Prerequisites
1. Python 3.10+ with pip 2. Ollama (optional - only if using local classification) 3. Anthropic API key or Claude Code OAuth token (or other provider key)
Setup
# Clone if not already present
git clone https://github.com/alexrudloff/llmrouter.git
cd llmrouterCreate virtual environment (required on modern Python)
python3 -m venv venv
source venv/bin/activateInstall dependencies
pip install -r requirements.txtPull classifier model (if using local classification)
ollama pull qwen2.5:3bCopy and customize config
cp config.yaml.example config.yaml
Edit config.yaml with your API key and model preferences
Verify Installation
# Start the server
source venv/bin/activate
python server.pyIn another terminal, test health endpoint
curl http://localhost:4001/health
Should return: {"status": "ok", ...}
Start the Server
python server.py
Options:
--port PORT - Port to listen on (default: 4001)--host HOST - Host to bind (default: 127.0.0.1)--config PATH - Config file path (default: config.yaml)--log - Enable verbose logging--openclaw - Enable OpenClaw compatibility (rewrites model name in system prompt)Edit config.yaml to customize:
Model Routing
# Anthropic routing
models:
super_easy: "anthropic:claude-haiku-4-5-20251001"
easy: "anthropic:claude-haiku-4-5-20251001"
medium: "anthropic:claude-sonnet-4-20250514"
hard: "anthropic:claude-opus-4-20250514"
super_hard: "anthropic:claude-opus-4-20250514"OpenAI routing
models:
super_easy: "openai:gpt-4o-mini"
easy: "openai:gpt-4o-mini"
medium: "openai:gpt-4o"
hard: "openai:o3-mini"
super_hard: "openai:o3"Google Gemini routing
models:
super_easy: "google:gemini-2.0-flash"
easy: "google:gemini-2.0-flash"
medium: "google:gemini-2.0-flash"
hard: "google:gemini-2.0-flash"
super_hard: "google:gemini-2.0-flash"
Note: Reasoning models are auto-detected and use correct API params.
Classifier
Three options for classifying request complexity:
Local (default) - Free, requires Ollama:
classifier:
provider: "local"
model: "qwen2.5:3b"
Anthropic - Uses Haiku, fast and cheap:
classifier:
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
OpenAI - Uses GPT-4o-mini:
classifier:
provider: "openai"
model: "gpt-4o-mini"
Google - Uses Gemini:
classifier:
provider: "google"
model: "gemini-2.0-flash"
Kimi - Uses Moonshot:
classifier:
provider: "kimi"
model: "moonshot-v1-8k"
Use remote (anthropic/openai/google/kimi) if your machine can't run local models.
Supported Providers
anthropic:claude-* - Anthropic Claude models (tested)openai:gpt-*, openai:o1-*, openai:o3-* - OpenAI models (tested)google:gemini-* - Google Gemini models (tested)kimi:kimi-k2.5, kimi:moonshot-* - Kimi/Moonshot models (tested)local:model-name - Local Ollama models (tested)"externally-managed-environment" error
Python 3.11+ requires virtual environments. Create one:python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
"Connection refused" on port 4001
Server isn't running. Start it:source venv/bin/activate && python server.py
Classification returns wrong complexity
EditROUTES.md to tune classification rules. The classifier reads this file to determine complexity levels.Ollama errors / "model not found"
Ensure Ollama is running and the model is pulled:ollama serve # Start Ollama if not running
ollama pull qwen2.5:3b
OAuth token not working
Ensure your token inconfig.yaml starts with sk-ant-oat. The router auto-detects OAuth tokens and adds required identity headers.LaunchAgent not starting
Check logs and ensure paths are absolute:cat ~/Library/LaunchAgents/com.llmrouter.plist # Verify paths
cat /path/to/llmrouter/logs/stderr.log # Check for errors
clawhub install llmrouter