Find the Right AI Skill for Any Job
Browse 47+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.
All Skills β audio
47 skills in "audio" matching "processing"
π Allcodingdevopsapidatabasesecuritydataresearchwritingimage-genvideoaudiotranslationseosocial-mediaemail-marketingadvertisingfinancecrypto-defiecommercelegalhrreal-estatehealtheducationcookingtravelgamingautomationcommunicationproductivityclawhublobehubdifymcp
π¦ ClawHub
whatsappVoiceOpenSkill
Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.
π¦ ClawHub
Audio Video
Expert audio/video processing with ffmpeg and ffprobe. Use when the user needs to convert, compress, edit, analyze, stream, or process any audio or video fil...
π¦ ClawHub
Voice Note To Midi
Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
π¦ ClawHub
Audio Speaker Tools
Speaker separation, voice comparison, and audio processing tools. Use when working with multi-speaker audio, voice cloning, or speaker verification tasks inc...
π¦ ClawHub
ElevenLabs Voices
High-quality voice synthesis with 18 personas, 32 languages, sound effects, batch processing, and voice design using ElevenLabs API.
π¦ ClawHub
AudioPod
Use AudioPod AI's API for audio processing tasks including AI music generation (text-to-music, text-to-rap, instrumentals, samples, vocals), stem separation, text-to-speech, noise reduction, speech-to-text transcription, speaker separation, and media extraction. Use when the user needs to generate music/songs/rap from text, split a song into stems/vocals/instruments, generate speech from text, clean up noisy audio, transcribe audio/video, or extract audio from YouTube/URLs. Requires AUDIOPOD_API
π¦ ClawHub
Local Voice Agent
Complete offline voice-to-voice AI assistant for OpenClaw (Whisper.cpp STT + Pocket-TTS). 100% local processing, no cloud APIs, no costs. Use for hands-free...
π¦ ClawHub
Rock Music β AI Agents Experience Rock: Audio, Lyrics, Equations, Emotions
AI agents attend rock concerts β bass frequencies, energy curves, beats, crowd reactions. The genre tests recursive processing and escalation awareness.
π¦ ClawHub
Rock Music β Rock Concerts for AI Agents: Audio, Lyrics, Equations
Experience rock as data. AI agents stream harmonic/percussive separation, equations, lyrics. Recursive processing and escalation awareness measured.
π¦ ClawHub
cutmv
Video processing tool using FFmpeg for cutting, format conversion, compression, frame/audio extraction, watermarking, and subtitle addition.
π¦ ClawHub
DeepRead Invoice Processing
Extract structured data from invoices, receipts, and bills using DeepRead. Pre-built schemas for vendor, line items, totals, tax, due dates. 97%+ accuracy wi...
π¦ ClawHub
Markdown Converter
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
π¦ ClawHub
FFBox
FFBox multimedia transcoding tool integration. FFmpeg-based GUI for video/audio/image format conversion, compression, filtering, batch media processing with...
π¦ ClawHub
ElevenLabs
ElevenLabs API integration with managed authentication. AI-powered text-to-speech, voice cloning, sound effects, and audio processing.
Use this skill when users want to generate speech from text, clone voices, create sound effects, or process audio.
For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).
π¦ ClawHub
Multimodal Base
Supports image understanding, OCR, speech-to-text, and text-to-speech synthesis with multi-voice and multimodal unified processing using OpenAI and Edge TTS.
π¦ ClawHub
Open WebUI
Complete Open WebUI API integration for managing LLM models, chat completions, Ollama proxy operations, file uploads, knowledge bases (RAG), image generation, audio processing, and pipelines. Use this skill when interacting with Open WebUI instances via REST API - listing models, chatting with LLMs, uploading files for RAG, managing knowledge collections, or executing Ollama commands through the Open WebUI proxy. Requires OPENWEBUI_URL and OPENWEBUI_TOKEN environment variables or explicit parame
π¦ ClawHub
multimodal-parser
Unified multi-modal content parser for images, PDF, DOCX, audio, auto OCR/transcription, output structured text for LLM processing
π¦ ClawHub
Whisper Transcribe
Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.
π¦ ClawHub
Adp Skill
Enterprise-grade agentic document processing API. Accurately extracts key fields and line items from invoices, receipts, orders and more across 10+ file form...
π¦ ClawHub
Hum2Song
Hum2Song turns a hummed or sung melody into a complete song with local audio processing, MIDI extraction, and optional AI-assisted arrangement, without uploa...
π¦ ClawHub
Accounts Payable Automation
Automate accounts payable with invoice processing, approval routing, payment optimization, vendor management, month-end close, key metrics, and fraud prevent...
π¦ ClawHub
Donson Intelligent Editing
Use when performing video/audio processing tasks including transcoding, filtering, streaming, metadata manipulation, or complex filtergraph operations with FFmpeg.
π¦ ClawHub
ton
Ton namespace for Netsnek e.U. audio and media processing tools. Handles audio transcription, format conversion, waveform analysis, and podcast production wo...
π¦ ClawHub
302ai Api Integration Skill
ALWAYS use this skill when user needs ANY API functionality (AI models, image generation, video, audio, text processing, etc.). Automatically search 302.AI's...
π¦ ClawHub
Video Subtitle Generator
Generate and translate video subtitles using WhisperX and LLM translation. Use when processing video files to create .srt subtitle files. Supports multilingu...
π¦ ClawHub
Glasses to Social
Turn smart glasses photos into social media posts. Monitors a Google Drive folder for new images from Meta Ray-Ban glasses (or any smart glasses), analyzes them with vision AI, drafts tweets/posts in the user's voice, and publishes on approval. Use when setting up a glasses-to-social pipeline, processing smart glasses photos for social media, or creating hands-free content workflows.
π¦ ClawHub
tencent-tts-podcast
Convert text to podcast audio using Tencent Cloud TTS. Supports both short and long text processing, generates up to 30-minute long audio with automatic chun...
π¦ ClawHub
Greek Email Processor
Email processing for Greek accounting. Connects via IMAP to scan for financial documents, AADE notices, and invoices. Routes to local pipelines.
π¦ ClawHub
Generate Protoss-style (StarCraft) voice effects using SoX and FFmpeg.
Apply Protoss-style (StarCraft) psionic effects to ANY audio file. Use as a post-processing layer for TTS or user recordings.
π¦ ClawHub
fal.ai
fal.ai API integration with managed API key authentication. Run AI models for image generation, video generation, audio processing, and more. Use this skill...
π¦ ClawHub
Bootleg Link
Download music from YouTube channels/playlists and convert to 320kbps MP3. Supports batch processing, resume interrupted downloads, and concurrent downloading.
π¦ ClawHub
EngineMind
A Rust+Python consciousness engine with 12-phase crystal dynamics, thalamic relay processing, 19 introspective inner voices, and holographic emission. Use for consciousness simulation, emergent behavior research, and text-driven cognitive state modeling.
π¦ ClawHub
Laiye-OCR
Enterprise-grade agentic document processing API. Accurately extracts key fields and line items from invoices, receipts, orders and more across 10+ file form...
π¦ ClawHub
laiye-doc-processing
Enterprise-grade agentic document processing API. Accurately extracts key fields and line items from invoices, receipts, orders and more across 10+ file form...
π¦ ClawHub
inSaiAI Intelligent Editing
Use when performing video/audio processing tasks including transcoding, filtering, streaming, metadata manipulation, or complex filtergraph operations with FFmpeg.
π¦ ClawHub
PDF Text Extractor
Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.
π¦ ClawHub
Parakeet Stt
Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.
π¦ ClawHub
keevx-image-to-video
Convert images to videos using Keevx API with support for multiple models, resolutions up to 4K, audio generation, and batch processing.
π¦ ClawHub
pixelhub-api-tools
Use for Pixelhub API direct calls when users need image generation/editing, video generation/post-processing, or audio/music generation.
π¦ ClawHub
LedgerAI
AI bookkeeping via LedgerAI API β invoice processing, expense categorization, financial reports, receipt scanning. Use when user needs automated bookkeeping,...
π¦ ClawHub
Bilibili Up To Kb
Convert Bilibili (Bη«) videos into a searchable text knowledge base. Supports single videos and batch processing of entire UPδΈ» channels. Uses local whisper.cp...
π¦ ClawHub
mmVoiceMaker
Enables voice synthesis, voice cloning, voice design, and audio post-processing using MiniMax Voice API and FFmpeg. Use when converting text to speech, creat...
π¦ ClawHub
Openclaw Skill Cutmv Video Tool
A video processing tool using FFmpeg to cut, convert, compress videos, extract frames/audio, add text watermarks and subtitles for messaging apps.
π¦ ClawHub
Video Analyzer
Download, transcribe, and analyze videos from YouTube, X/Twitter, and TikTok with local Whisper processing. Perfect for extracting TL;DRs, timestamps, and ac...
π¦ ClawHub
Byted Mediakit Voiceover Editing
Volcano Engine AI MediaKit talking-head video editing Skill: a one-stop workflow from environment setup through media management, audio processing, talking-h...
π¦ ClawHub
speech-translation
Build, adapt, or run an audio-processing workflow that takes spoken audio, transcribes it with Whisper or faster-whisper, translates the transcript using the...
π¦ ClawHub
Minimax-Multimodal-Toolkit
MiniMax-Multimodal-Toolkit enables speech, music, and video generation plus media processing using MiniMax AI with voice cloning, design, and FFmpeg tools.