Find the Right AI Skill for Any Job

Browse 160+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

All Skills — audio

160 skills in "audio" matching "support"

Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required. Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud. Supports multiple languages including German (thorsten) and English (ryan) voices. Outputs Telegram-compatible voice notes with [[audio_as_voice]] tag.

🦀 ClawHub

Linkedin Monitor

Bulletproof LinkedIn inbox monitoring with progressive autonomy. Monitors messages hourly, drafts replies in your voice, and alerts you to new conversations. Supports 4 autonomy levels from monitor-only to full autonomous.

🦀 ClawHub

Slides/PPT generation and voice narration

AI-powered presentation generation using 2slides API. Create slides from text content, match reference image styles, or summarize documents into presentations. Use when users request to "create a presentation", "make slides", "generate a deck", "create slides from this content/document/image", or any presentation creation task. Supports theme selection, multiple languages, and both synchronous and asynchronous generation modes.

🦀 ClawHub

Video Subtitles

Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captions, transcripts, or hardcoded subtitles for WhatsApp/social media.

🦀 ClawHub

Azure Ai Voicelive Py

Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, a

🦀 ClawHub

xiaomi-mimo-v2-tts

Generate speech audio (WAV) from text using Xiaomi MiMo TTS (mimo-v2-tts model). Supports preset voices (mimo_default, default_zh, default_en), style control...

🦀 ClawHub

iFlytek Ultra-Realistic TTS

iFlytek Ultra-Realistic TTS (超拟人语音合成) — synthesize natural, expressive speech from text using iFlytek's ultra-realistic voice synthesis API. Supports 50+ voi...

🦀 ClawHub

Youtube Audio Download

Download YouTube video audio and convert to MP3. Supports age-restricted videos with cookies.

🦀 ClawHub

Ai Video Gen

End-to-end AI video generation - create videos from text prompts using image generation, video synthesis, voice-over, and editing. Supports OpenAI DALL-E, Replicate models, LumaAI, Runway, and FFmpeg editing.

🦀 ClawHub

Freepik

Generate images, videos, icons, audio, and more using Freepik's AI API. Supports Mystic, Flux, Kling, Hailuo, Seedream, RunWay, Magnific upscaling, stock con...

🦀 ClawHub

Ai Video Gen 1.0.0

End-to-end AI video generation - create videos from text prompts using image generation, video synthesis, voice-over, and editing. Supports OpenAI DALL-E, Re...

🦀 ClawHub

ACE-Step Music Generation

Generate high-quality music on Apple Silicon Macs using ACE-Step 1.5 with MLX backend, supporting custom prompts, durations, and output formats.

🦀 ClawHub

Invoicy

Generate, download, and email professional invoices with GST/IGST support and flexible payment terms.

🦀 ClawHub

Alibabacloud Video Translation

Alibaba Cloud IMS (Intelligent Media Services) based video translation Skill. Supports subtitle extraction (ASR/OCR), translation, and speech synthesis trans...

🦀 ClawHub

Local Whisper

Install and use whisper.cpp (local, free/offline speech-to-text) with OpenClaw. Supports downloading different ggml model sizes (tiny/base/small/medium/large...

🦀 ClawHub

Roon Controller

Control Roon music player through Roon API with automatic Core discovery and zone filtering. Supports play/pause, next/previous track, and current track query. Automatically finds Muspi zones. Supports Chinese commands.

🦀 ClawHub

Quotation Generator

Auto-generate professional PDF proforma invoices with company letterhead, multi-language support, and post-quote tracking.

🦀 ClawHub

Document Intelligence Mcp

Document OCR, classification, table extraction, and summarization using local AI vision. Supports invoices, contracts, forms, reports.

🦀 ClawHub

douyin-research-kit

Extract and analyze Douyin (抖音) content using yt-dlp. Supports video metadata, caption extraction, user profile analysis, music/sound info, and engagement st...

🦀 ClawHub

Edge TTS CN

Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch contro...

🦀 ClawHub

JARVIS AI Skills

Control robotic arms and grippers via voice or code with OpenClaw, supporting precise 6-DOF movement, force sensing, collision detection, and simulation.

🔌 MCP

evalstate/mcp-hfspace

📇 ☁️ - Use HuggingFace Spaces directly from Claude. Use Open Source Image Generation, Chat, Vision tasks and more. Supports Image, Audio and text uploads/downloads.

🦀 ClawHub

China Tts

国内可用的文本转语音技能，基于硅基流动（SiliconFlow）API。Use when the user wants to convert text to speech in China without VPN. Supports CosyVoice2-0.5B (multilingual, emotion c...

🦀 ClawHub

Voicenotes

Sync and access voice notes from Voicenotes.com. Use when the user wants to retrieve their voice recordings, transcripts, and AI summaries from Voicenotes. Supports fetching notes, syncing to markdown, and searching transcripts.

🦀 ClawHub

Ai Video Gen Temp

End-to-end AI video generation - create videos from text prompts using image generation, video synthesis, voice-over, and editing. Supports OpenAI DALL-E, Re...

🦀 ClawHub

Al Music Generation

Use this skill as an entry point to discover, select, and fetch specific integration parameters for all supported AI music generation models.

🦀 ClawHub

Punting Buddy: Horse Racing Analysis

Conversational horse racing analysis, racecard breakdowns, runner comparisons, odds or value chat, and punting-style decision support in the voice of a sharp...

🦀 ClawHub

U2-tts

Text-to-speech conversion using UniSound's TTS WebSocket API for generating high-quality Chinese Mandarin audio from text. Supports multiple voices, adjustab...

🦀 ClawHub

Voice Assistant

Windows voice companion for OpenClaw. Custom wake word via Porcupine, local STT via faster-whisper, streamed responses over the gateway WebSocket, and ElevenLabs TTS with natural chime/thinking sounds. Supports multi-turn conversation with automatic follow-up listening, mic suppression to prevent feedback, and a system tray with pause/resume. Recommended voices: Matilda (XrExE9yKIg1WjnnlVkGX, free tier) or Ivy (MClEFoImJXBTgLwdLI5n, paid tier). Fully customizable wake word, voice, hotkey, and si

🦀 ClawHub

Topmediai AI Music Generator

Generate AI music, BGM, or lyrics via TopMediai API. Supports auto polling and two-stage output (preview first, then final full audio) for generation tasks.

🦀 ClawHub

TopMediai TTS

TopMediai text-to-speech skill. Supports key entitlement info, voices listing (official + cloned), and text-to-speech generation.

🦀 ClawHub

local-voice-reply

Local OPUS/Ogg voice-reply pipeline for Feishu/Discord with structured voice customization. Default voice is Juno (`voice/juno_ref.wav`), with support for re...

🦀 ClawHub

Multimodal Base

Supports image understanding, OCR, speech-to-text, and text-to-speech synthesis with multi-voice and multimodal unified processing using OpenAI and Edge TTS.

🦀 ClawHub

Text-to-Speech

SenseAudio Text-to-Speech (TTS) API for converting text to natural speech. Supports synchronous and SSE streaming modes, multiple voices, emotion control, sp...

🦀 ClawHub

Minimax Tts Cn

MiniMax TTS skill (enhanced). Multi-agent voice support (each agent can select a unique voice written in SOUL.md), native voice message for Telegram (MP3) an...

🦀 ClawHub

ClawVideo Generation

Generate Pinterest-optimized vertical videos using JSON2Video API. Supports AI-generated or URL-based images, AI-generated or provided voiceovers, optional subtitles, and zoom effects. Use when creating video content for Pinterest affiliate marketing, creating vertical social media videos, automating video production with JSON2Video API, or generating videos with voiceovers and subtitles.

🦀 ClawHub

Speech is Cheap Transcribe

Fast, affordable automatic speech-to-text transcription supporting 100 languages, speaker diarization, word timestamps, and customizable output formats.

🦀 ClawHub

tts

Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch contro...

🦀 ClawHub

Whisper Transcriber

Offline speech-to-text (ASR) using whisper.cpp (whisper-cli) + ffmpeg. Supports batch transcription, timestamps, SRT/TXT/JSON outputs, and model download. Cr...

🦀 ClawHub

Xiaozhi Claw

XiaoZhi AI Device (ESP32) integration for OpenClaw. Enables real-time voice interaction with your AI assistant through XiaoZhi hardware. Supports WebSocket b...

🦀 ClawHub

Kimai Time Tracking

Complete Kimai time-tracking API integration. Manage timesheets, customers, projects, activities, teams, invoices and exports via REST API. Supports time tracking workflows, reporting, and administrative operations. Keywords - kimai, zeiterfassung, timesheet, tracking, project, customer, activity, invoice, export, timer, stunden

🦀 ClawHub

Inworld TTS

Text-to-speech via Inworld.ai API. Use when generating voice audio from text, creating spoken responses, or converting text to MP3/audio files. Supports multiple voices, speaking rates, and streaming for long text.

🦀 ClawHub

Webcodecs String Finder

Finds valid WebCodecs strings for video and audio by researching codec support tables and detailed specifications on webcodecsfundamentals.org.

🦀 ClawHub

Whisper Transcribe

Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.

🦀 ClawHub

Qwen Asr Skill

Provides high-accuracy speech-to-text conversion supporting 22 Chinese dialects and 30 languages with automatic language detection, running on CPU.

🦀 ClawHub

minimax-tts

Use MiniMax speech-2.8-hd model for high-quality text-to-speech synthesis. Supports multiple Chinese and English voices. Install when needed.

🦀 ClawHub

Dream Talking Image

Generate talking videos from images using Talking Image API. Create talking videos from audio and images, supporting non-human faces like pets or animated ch...

🦀 ClawHub

whatsappVoiceOpenSkill

Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.

Page 1 / 4 (160 skills)Next →