🎁 Get the FREE AI Skills Starter GuideSubscribe →
BytesAgainBytesAgain
🦀 ClawHub

Vision Helper — AI Image Analysis

by @ravenquasar

Analyze images using local or cloud vision models via Ollama to identify content, UI elements, screenshots, or extract text with OCR support.

Versionv1.0.0
💡 Examples

Basic

# Analyze an image (default: English description)
python3 /scripts/analyze_image.py 

With a custom prompt

python3 /scripts/analyze_image.py "Is this a chess game? Describe the board state"

With a specific model

python3 /scripts/analyze_image.py "Describe content" kimi-k2.5:cloud

> resolves to your OpenClaw skill installation directory, typically ~/.openclaw/workspace/skills/vision-helper/.

In Conversation

When you need to analyze an image, use the exec tool:

exec: python3 /scripts/analyze_image.py /path/to/image.png "What do you see?"

Important: Set exec timeout to 120–180 seconds, as cloud vision models are slow.

Screenshot + Analysis Workflow

#### Option A: Browser screenshot → analyze

1. browser(action="screenshot") → get screenshot path (MEDIA: xxx)
2. exec("/scripts/analyze_image.py  'Describe this UI'")
3. Act on the analysis result

#### Option B: Desktop screenshot → analyze

macOS:

1. exec("screencapture -x /tmp/screen.png")
2. exec("/scripts/analyze_image.py /tmp/screen.png 'Describe the desktop'")

Linux:

1. exec("gnome-screenshot -f /tmp/screen.png")
   — or —
   exec("import /tmp/screen.png")  # ImageMagick
   — or —
   exec("scrot /tmp/screen.png")
2. exec("/scripts/analyze_image.py /tmp/screen.png 'Describe the desktop'")

#### Option C: Game/App UI → analyze → act

1. Screenshot the current screen
2. Use vision-helper to identify UI elements, buttons, text
3. Execute clicks/input based on the analysis

📋 Tips & Best Practices

Q: Can I use the built-in image tool instead?

A: It works for local models but will time out on cloud vision models. Always prefer this skill's script for reliable results.

Q: What image formats are supported?

A: PNG, JPG, JPEG, GIF, WebP, BMP, TIFF, SVG. Maximum file size: 20 MB.

Q: Where should I save screenshots?

A: Any readable directory works — /tmp/, your workspace, etc. This script has no path restrictions.

Q: How do I use a Chinese prompt?

A: Pass it as the second argument: python3 /scripts/analyze_image.py /tmp/img.png "请描述这张图片的内容"

View on ClawHub
TERMINAL
clawhub install vision-helper

🧪 Use this skill with your agent

Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

🔍 Can't find the right skill?

Search 60,000+ AI agent skills — free, no login needed.

Search Skills →