Desktop automation ultra
by @jordaneparis
Automate comprehensive desktop tasks on Windows/macOS/Linux with safe, logged mouse, keyboard, OCR, image recognition, macro recording, and replay features.
clawhub install desktop-automation-ultraπ About This Skill
Desktop Automation Skill v2.0
 
Complete desktop automation for Windows/macOS/Linux. Zero-error edition.
β οΈ Privacy & Security
CRITICAL: This skill captures ALL keyboard and mouse events.
recorded_macro/ directoryπ― What It Does
Automate desktop interactions without APIs:
π Safety Features (Built-In)
1. Safe Mode (Default: ON)
Blocks dangerous actions when enabled:type, press_key, click, drag are monitoredrm , del , C:\Windows\, /etc/, sudo, etc.2. Dry-Run Mode
All actions supportdry_run=true:
3. Audit Logging
Every action logged to~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log4. Thread Safety
All modules use locks to prevent race conditions.π¦ Installation
1. Extract Files
Placedesktop-automation-ultra-local/ in:
C:\Users\\.openclaw\workspace\skills\ ~/.openclaw/workspace/skills/2. Install Dependencies
pip install -r requirements.txt
3. Optional: Tesseract for OCR
Forfind_text_on_screen functionality:
sudo apt install tesseract-ocrbrew install tesseract4. Restart OpenClaw
openclaw gateway restart
π Quick Start
Basic Click
action: click
params:
x: 100
y: 100
dry_run: true # Test first!
Type Text
action: type
params:
text: "Hello World"
interval: 0.05 # Delay between keys
dry_run: false
Find Image
action: find_image
params:
template_path: "templates/button.png"
confidence: 0.95
Extract Text (OCR)
action: read_text_ocr
params:
lang: "fra" # French
π Core Actions
Mouse & Keyboard
| Action | Parameters | Returns |
|--------|------------|---------|
| click | x, y, button="left", dry_run | {status, x, y} |
| type | text, interval=0.05, dry_run | {status, text} |
| press_key | key, dry_run | {status, key} |
| move_mouse | x, y, duration=0.5, dry_run | {status, x, y} |
| scroll | amount=5, dry_run | {status, amount} |
| drag | start_x, start_y, end_x, end_y, duration=0.5, dry_run | {status} |
| copy_to_clipboard | text, dry_run | {status} |
| paste_from_clipboard | dry_run | {status, length} |
Screenshots & Windows
| Action | Parameters | Returns |
|--------|------------|---------|
| screenshot | path="~/Desktop/screenshot.png", dry_run | {status, path} |
| get_active_window | dry_run | {status, title, x, y, width, height} |
| list_windows | dry_run | {status, windows[], count} |
| activate_window | title_substring, dry_run | {status, title} |
Image Recognition (requires OpenCV)
| Action | Parameters | Returns |
|--------|------------|---------|
| find_image | template_path, confidence=0.9, dry_run | {status, x, y, confidence} |
| find_image_multiscale | template_path, confidence, scale_factors, dry_run | {status, x, y, confidence, scale} |
| wait_for_image | template_path, timeout=30.0, interval=0.5, confidence=0.9, dry_run | {status, x, y, confidence} |
OCR / Text Recognition (requires Tesseract)
| Action | Parameters | Returns |
|--------|------------|---------|
| find_text_on_screen | text, lang="fra", dry_run | {status, locations[], count} |
| find_all_text_on_screen | text, lang="fra", dry_run | {status, data[], count} |
| read_text_ocr | lang="fra", dry_run | {status, text, length} |
| read_text_region | x, y, width, height, lang="fra", dry_run | {status, text, length} |
| extract_screen_data | region={}, output_format="json", lang="fra", dry_run | {status, data[], count} |
Macros
| Action | Parameters | Returns |
|--------|------------|---------|
| play_macro | macro_path, speed=1.0, dry_run | {status, executed, total, errors[]} |
| stop_macro | β | {status} |
| play_macro_with_subroutines | macro_path, speed=1.0, sub_macros_dir, dry_run | {status, executed, total, errors[]} |
Safety Management
| Action | Parameters | Returns |
|--------|------------|---------|
| set_safe_mode | enabled=true | {status, safe_mode} |
| get_safety_status | β | {status, safe_mode_enabled, dangerous_patterns, dangerous_actions[]} |
π Macro Format
Recorded macros are JSON with this structure:
{
"events": [
{
"action": "click",
"params": {"x": 100, "y": 50},
"wait": 500
},
{
"action": "type",
"params": {"text": "Hello"},
"wait": 200
},
{
"action": "press_key",
"params": {"key": "return"},
"wait": 100
}
]
}
action β action nameparams β action parameterswait β milliseconds to wait before next actionπ§ Advanced: Mouse Move Debouncing
To avoid recording hundreds of move_mouse events during a smooth drag, the recorder uses debouncing:
N seconds (default: 1 sec), the final position is recordedExample:
move_mouse event (end coordinates)move_mouse events (one per "stop")π§ͺ Testing
Run the unit test suite:
python scripts/test_automation.py
Output:
test_dry_run_click ... ok
test_get_active_window ... ok
test_safe_mode_blocks_dangerous ... ok
...
Ran 13 tests
OK
π Logging
All actions logged to: ~/.openclaw/skills/desktop-automation-logs/automation_YYYY-MM-DD.log
Example:
[2026-03-15 10:23:45] [INFO] ActionManager: ActionManager initialized with safe_mode=True
[2026-03-15 10:23:46] [INFO] ActionManager: Clicked at (100, 50) with left button
[2026-03-15 10:23:47] [INFO] ActionManager: Typed: Hello World
βοΈ Configuration
Environment Variables
# Override log directory
export AUTOMATION_LOG_DIR=~/my_logsDisable safe mode globally (NOT recommended)
export AUTOMATION_SAFE_MODE=false
π Troubleshooting
"pyautogui failsafe triggered"
Move mouse to corner of screen to stop.OCR returns empty text
read_text_ocr instead of find_text_on_screenImage recognition not finding template
find_image_multiscale to detect at different scalesActions blocked by safe mode
This is intentional. To run dangerous actions:action: set_safe_mode
params:
enabled: false
Then execute your action. Re-enable safe mode immediately after:
action: set_safe_mode
params:
enabled: true
π License
MIT License. See LICENSE file.
π Files Structure
desktop-automation-ultra-local/
βββ SKILL.md (This file)
βββ requirements.txt (Python dependencies)
βββ lib/
β βββ actions.py (Core click/type/drag actions)
β βββ image_recognition.py (OpenCV template matching)
β βββ ocr_engine.py (Tesseract OCR)
β βββ macro_player.py (Record/playback macros)
β βββ safety_manager.py (Safe mode, blocking)
β βββ utils.py (Logging, helpers)
βββ scripts/
β βββ test_automation.py (Unit tests)
βββ recorded_macro/ (Output: saved macros)
β Validation Checklist
Status: PRODUCTION READY β
*Last updated: 2026-03-15* *Version: 2.0.0*