🎁 Get the FREE AI Skills Starter GuideSubscribe →
BytesAgainBytesAgain
🦀 ClawHub

Cuda Ollama

by @twinsgeeks

CUDA Ollama — route Ollama LLM inference across NVIDIA GPUs with automatic CUDA load balancing. CUDA Ollama cluster for RTX 4090, RTX 4080, A100, L40S, H100....

Versionv1.0.0
Installs2
💡 Examples

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/

On your CUDA Ollama router machine:

herd    # start the CUDA Ollama router (port 11435)

On every NVIDIA CUDA machine:

herd-node    # auto-discovers the CUDA Ollama router via mDNS

Verify CUDA is available on each NVIDIA node:

nvidia-smi    # confirm NVIDIA CUDA driver is loaded
ollama ps     # confirm Ollama is using CUDA GPU

> No mDNS? Connect CUDA nodes directly: herd-node --router-url http://router-ip:11435

View on ClawHub
TERMINAL
clawhub install cuda-ollama

🧪 Use this skill with your agent

Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

🔍 Can't find the right skill?

Search 60,000+ AI agent skills — free, no login needed.

Search Skills →