🦀 ClawHub
Cuda Ollama
by @twinsgeeks
CUDA Ollama — route Ollama LLM inference across NVIDIA GPUs with automatic CUDA load balancing. CUDA Ollama cluster for RTX 4090, RTX 4080, A100, L40S, H100....
💡 Examples
pip install ollama-herd # PyPI: https://pypi.org/project/ollama-herd/
On your CUDA Ollama router machine:
herd # start the CUDA Ollama router (port 11435)
On every NVIDIA CUDA machine:
herd-node # auto-discovers the CUDA Ollama router via mDNS
Verify CUDA is available on each NVIDIA node:
nvidia-smi # confirm NVIDIA CUDA driver is loaded
ollama ps # confirm Ollama is using CUDA GPU
> No mDNS? Connect CUDA nodes directly: herd-node --router-url http://router-ip:11435
TERMINAL
clawhub install cuda-ollama