BytesAgainBytesAgain

Find the Right AI Skill for Any Job

Browse 160+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case β†’Pick My Role

All Skills β€” audio

160 skills in "audio" matching "support"

πŸ¦€ ClawHub
Voice Reply
Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required. Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud. Supports multiple languages including German (thorsten) and English (ryan) voices. Outputs Telegram-compatible voice notes with [[audio_as_voice]] tag.
πŸ¦€ ClawHub
Linkedin Monitor
Bulletproof LinkedIn inbox monitoring with progressive autonomy. Monitors messages hourly, drafts replies in your voice, and alerts you to new conversations. Supports 4 autonomy levels from monitor-only to full autonomous.
πŸ¦€ ClawHub
Slides/PPT generation and voice narration
AI-powered presentation generation using 2slides API. Create slides from text content, match reference image styles, or summarize documents into presentations. Use when users request to "create a presentation", "make slides", "generate a deck", "create slides from this content/document/image", or any presentation creation task. Supports theme selection, multiple languages, and both synchronous and asynchronous generation modes.
πŸ¦€ ClawHub
Video Subtitles
Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captions, transcripts, or hardcoded subtitles for WhatsApp/social media.
πŸ¦€ ClawHub
Azure Ai Voicelive Py
Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, a
πŸ¦€ ClawHub
xiaomi-mimo-v2-tts
Generate speech audio (WAV) from text using Xiaomi MiMo TTS (mimo-v2-tts model). Supports preset voices (mimo_default, default_zh, default_en), style control...
πŸ¦€ ClawHub
iFlytek Ultra-Realistic TTS
iFlytek Ultra-Realistic TTS (θΆ…ζ‹ŸδΊΊθ―­ιŸ³εˆζˆ) β€” synthesize natural, expressive speech from text using iFlytek's ultra-realistic voice synthesis API. Supports 50+ voi...
πŸ¦€ ClawHub
Youtube Audio Download
Download YouTube video audio and convert to MP3. Supports age-restricted videos with cookies.
πŸ¦€ ClawHub
Ai Video Gen
End-to-end AI video generation - create videos from text prompts using image generation, video synthesis, voice-over, and editing. Supports OpenAI DALL-E, Replicate models, LumaAI, Runway, and FFmpeg editing.
πŸ¦€ ClawHub
Freepik
Generate images, videos, icons, audio, and more using Freepik's AI API. Supports Mystic, Flux, Kling, Hailuo, Seedream, RunWay, Magnific upscaling, stock con...
πŸ¦€ ClawHub
Ai Video Gen 1.0.0
End-to-end AI video generation - create videos from text prompts using image generation, video synthesis, voice-over, and editing. Supports OpenAI DALL-E, Re...
πŸ¦€ ClawHub
ACE-Step Music Generation
Generate high-quality music on Apple Silicon Macs using ACE-Step 1.5 with MLX backend, supporting custom prompts, durations, and output formats.
πŸ¦€ ClawHub
Invoicy
Generate, download, and email professional invoices with GST/IGST support and flexible payment terms.
πŸ¦€ ClawHub
Alibabacloud Video Translation
Alibaba Cloud IMS (Intelligent Media Services) based video translation Skill. Supports subtitle extraction (ASR/OCR), translation, and speech synthesis trans...
πŸ¦€ ClawHub
Local Whisper
Install and use whisper.cpp (local, free/offline speech-to-text) with OpenClaw. Supports downloading different ggml model sizes (tiny/base/small/medium/large...
πŸ¦€ ClawHub
Roon Controller
Control Roon music player through Roon API with automatic Core discovery and zone filtering. Supports play/pause, next/previous track, and current track query. Automatically finds Muspi zones. Supports Chinese commands.
πŸ¦€ ClawHub
Quotation Generator
Auto-generate professional PDF proforma invoices with company letterhead, multi-language support, and post-quote tracking.
πŸ¦€ ClawHub
Document Intelligence Mcp
Document OCR, classification, table extraction, and summarization using local AI vision. Supports invoices, contracts, forms, reports.
πŸ¦€ ClawHub
douyin-research-kit
Extract and analyze Douyin (ζŠ–ιŸ³) content using yt-dlp. Supports video metadata, caption extraction, user profile analysis, music/sound info, and engagement st...
πŸ¦€ ClawHub
Edge TTS CN
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch contro...
πŸ¦€ ClawHub
JARVIS AI Skills
Control robotic arms and grippers via voice or code with OpenClaw, supporting precise 6-DOF movement, force sensing, collision detection, and simulation.
πŸ”Œ MCP
evalstate/mcp-hfspace
πŸ“‡ ☁️ - Use HuggingFace Spaces directly from Claude. Use Open Source Image Generation, Chat, Vision tasks and more. Supports Image, Audio and text uploads/downloads.
πŸ¦€ ClawHub
China Tts
ε›½ε†…ε―η”¨ηš„ζ–‡ζœ¬θ½¬θ―­ιŸ³ζŠ€θƒ½οΌŒεŸΊδΊŽη‘…εŸΊζ΅εŠ¨οΌˆSiliconFlowοΌ‰API。Use when the user wants to convert text to speech in China without VPN. Supports CosyVoice2-0.5B (multilingual, emotion c...
πŸ¦€ ClawHub
Voicenotes
Sync and access voice notes from Voicenotes.com. Use when the user wants to retrieve their voice recordings, transcripts, and AI summaries from Voicenotes. Supports fetching notes, syncing to markdown, and searching transcripts.
πŸ¦€ ClawHub
Ai Video Gen Temp
End-to-end AI video generation - create videos from text prompts using image generation, video synthesis, voice-over, and editing. Supports OpenAI DALL-E, Re...
πŸ¦€ ClawHub
Al Music Generation
Use this skill as an entry point to discover, select, and fetch specific integration parameters for all supported AI music generation models.
πŸ¦€ ClawHub
Punting Buddy: Horse Racing Analysis
Conversational horse racing analysis, racecard breakdowns, runner comparisons, odds or value chat, and punting-style decision support in the voice of a sharp...
πŸ¦€ ClawHub
U2-tts
Text-to-speech conversion using UniSound's TTS WebSocket API for generating high-quality Chinese Mandarin audio from text. Supports multiple voices, adjustab...
πŸ¦€ ClawHub
Voice Assistant
Windows voice companion for OpenClaw. Custom wake word via Porcupine, local STT via faster-whisper, streamed responses over the gateway WebSocket, and ElevenLabs TTS with natural chime/thinking sounds. Supports multi-turn conversation with automatic follow-up listening, mic suppression to prevent feedback, and a system tray with pause/resume. Recommended voices: Matilda (XrExE9yKIg1WjnnlVkGX, free tier) or Ivy (MClEFoImJXBTgLwdLI5n, paid tier). Fully customizable wake word, voice, hotkey, and si
πŸ¦€ ClawHub
Topmediai AI Music Generator
Generate AI music, BGM, or lyrics via TopMediai API. Supports auto polling and two-stage output (preview first, then final full audio) for generation tasks.
πŸ¦€ ClawHub
TopMediai TTS
TopMediai text-to-speech skill. Supports key entitlement info, voices listing (official + cloned), and text-to-speech generation.
πŸ¦€ ClawHub
local-voice-reply
Local OPUS/Ogg voice-reply pipeline for Feishu/Discord with structured voice customization. Default voice is Juno (`voice/juno_ref.wav`), with support for re...
πŸ¦€ ClawHub
Multimodal Base
Supports image understanding, OCR, speech-to-text, and text-to-speech synthesis with multi-voice and multimodal unified processing using OpenAI and Edge TTS.
πŸ¦€ ClawHub
Text-to-Speech
SenseAudio Text-to-Speech (TTS) API for converting text to natural speech. Supports synchronous and SSE streaming modes, multiple voices, emotion control, sp...
πŸ¦€ ClawHub
Minimax Tts Cn
MiniMax TTS skill (enhanced). Multi-agent voice support (each agent can select a unique voice written in SOUL.md), native voice message for Telegram (MP3) an...
πŸ¦€ ClawHub
ClawVideo Generation
Generate Pinterest-optimized vertical videos using JSON2Video API. Supports AI-generated or URL-based images, AI-generated or provided voiceovers, optional subtitles, and zoom effects. Use when creating video content for Pinterest affiliate marketing, creating vertical social media videos, automating video production with JSON2Video API, or generating videos with voiceovers and subtitles.
πŸ¦€ ClawHub
Speech is Cheap Transcribe
Fast, affordable automatic speech-to-text transcription supporting 100 languages, speaker diarization, word timestamps, and customizable output formats.
πŸ¦€ ClawHub
tts
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch contro...
πŸ¦€ ClawHub
Whisper Transcriber
Offline speech-to-text (ASR) using whisper.cpp (whisper-cli) + ffmpeg. Supports batch transcription, timestamps, SRT/TXT/JSON outputs, and model download. Cr...
πŸ¦€ ClawHub
Xiaozhi Claw
XiaoZhi AI Device (ESP32) integration for OpenClaw. Enables real-time voice interaction with your AI assistant through XiaoZhi hardware. Supports WebSocket b...
πŸ¦€ ClawHub
Kimai Time Tracking
Complete Kimai time-tracking API integration. Manage timesheets, customers, projects, activities, teams, invoices and exports via REST API. Supports time tracking workflows, reporting, and administrative operations. Keywords - kimai, zeiterfassung, timesheet, tracking, project, customer, activity, invoice, export, timer, stunden
πŸ¦€ ClawHub
Inworld TTS
Text-to-speech via Inworld.ai API. Use when generating voice audio from text, creating spoken responses, or converting text to MP3/audio files. Supports multiple voices, speaking rates, and streaming for long text.
πŸ¦€ ClawHub
Webcodecs String Finder
Finds valid WebCodecs strings for video and audio by researching codec support tables and detailed specifications on webcodecsfundamentals.org.
πŸ¦€ ClawHub
Whisper Transcribe
Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.
πŸ¦€ ClawHub
Qwen Asr Skill
Provides high-accuracy speech-to-text conversion supporting 22 Chinese dialects and 30 languages with automatic language detection, running on CPU.
πŸ¦€ ClawHub
minimax-tts
Use MiniMax speech-2.8-hd model for high-quality text-to-speech synthesis. Supports multiple Chinese and English voices. Install when needed.
πŸ¦€ ClawHub
Dream Talking Image
Generate talking videos from images using Talking Image API. Create talking videos from audio and images, supporting non-human faces like pets or animated ch...
πŸ¦€ ClawHub
whatsappVoiceOpenSkill
Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.
Page 1 / 4 (160 skills)Next β†’