Find the Right AI Skill for Any Job

Browse 160+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

All Skills — audio

160 skills in "audio" matching "support"

Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.

🦀 ClawHub

keevx-video-translate

Translate videos into a specified target language using the Keevx API. Supports audio-only translation, subtitle generation, and dynamic duration adjustment....

🦀 ClawHub

Meta Video Ad Analyzer

Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.

🦀 ClawHub

Talking Circle

Create animated talking-circle videos (Telegram-style round video messages) from avatar frame images and audio. Supports audio-to-video and text-to-video via...

🦀 ClawHub

Agentic Calling

Enable AI agents to autonomously make, receive, transcribe, route, and record phone calls using Twilio with customizable voice messages and IVR support.

🦀 ClawHub

xeon_tts

Local TTS skill using OpenVINO Qwen3-TTS for voice cloning and emotion style synthesis, supporting QQBOT workflows with strict audio length and file retentio...

🦀 ClawHub

ifly-ocr-invoice

Recognize and extract structured data from invoices, receipts, and bills using iFlytek OCR API (科大讯飞票据识别). Supports VAT invoices, taxi receipts, train ticket...

🦀 ClawHub

ifly-speed-transcription

Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C...

🔌 MCP

lfnovo/content-core

🐍 🏠 - Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.

🦀 ClawHub

讯飞票据识别

Recognize and extract structured data from invoices, receipts, and bills using iFlytek OCR API (科大讯飞票据识别). Supports VAT invoices, taxi receipts, train ticket...

🦀 ClawHub

Aliyun Asr

Pure Aliyun ASR skill for voice message transcription, supports multiple channels including Feishu

🦀 ClawHub

keevx-image-to-video

Convert images to videos using Keevx API with support for multiple models, resolutions up to 4K, audio generation, and batch processing.

🦀 ClawHub

Invoice-Recognition

Extract invoice information from images and PDF files using Baidu OCR API, export to Excel. Supports single file, multiple files, or entire directory process...

🦀 ClawHub

Vision Recognition Ocr

Vehicle/animal/plant recognition plus OCR for screenshots, photos, invoices, and tables. Use when users ask 识别车型/看图识别/提取文字/OCR. Supports local path, URL, and...

🦀 ClawHub

ifly-voiceclone-tts

iFlytek Voice Clone tts(声音复刻) — train a custom voice model from audio samples and synthesize speech with the cloned voice. Supports the full workflow: get tr...

🦀 ClawHub

Baidu Speech Synthesis

Baidu Intelligent Cloud Speech Synthesis (TTS), supporting multi-role dialogue audio generation, SSML/segment-merge dual modes, speech rate/pitch adjustment.

🦀 ClawHub

Claw Use Android

Control and interact with real Android phones via HTTP and CLI without ADB or root, supporting screen reading, taps, typing, apps, calls, and voice.

🦀 ClawHub

muapi-media-generation

Generate AI images, videos, music, and audio from the terminal via muapi.ai — supports 100+ models including Flux, Midjourney v7, Kling 3.0, Veo3, and Suno V5

🦀 ClawHub

GrabGrab

Use when the user wants to download a video or audio from a URL. Supports 20+ platforms including YouTube, X/Twitter, TikTok, Instagram, Facebook, Reddit, Bi...

⭐ GitHub

Pinepods

A rust based podcast management system with multi-user support. Pinepods utilizes a central database so aspects like listen time and themes follow from device to device. With clients built using Tauri, it's a full cross-platform listening solution! [![Docker Container Build](https://github.com/madeo

⭐ GitHub

lifthrasiir/angolmois-rust

A minimalistic music video game which supports the BMS format

🦀 ClawHub

Douyin Video Transcribe

Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi...

🦀 ClawHub

SenseVoice Transcribe

Transcribe audio files (WAV/MP3/M4A/FLAC) to timestamped text using SenseVoice-Small + FSMN-VAD. Supports single-file and batch mode with VAD-anchored per-se...

⭐ GitHub

pdeljanov/Symphonia

Audio decoding and media demuxing library supporting AAC, FLAC, MP3, MP4, OGG, Vorbis, and WAV.

🦀 ClawHub

U2-audio-file-transcriber

Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer servic...

🦀 ClawHub

Bailian Studio

Call Aliyun Bailian via DashScope; support OCR, TTS, text-to-image and image-to-image.

🦀 ClawHub

LTX-2.3 Video API

Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use wh...

🦀 ClawHub

xeon_asr

Automatically converts received voice messages to text via an external ASR service, supporting multiple audio formats and integrating with OpenClaw.

🦀 ClawHub

spotify-control

macOS Spotify control skill for OpenClaw. Supports playback, volume, position, and metadata retrieval via AppleScript.

🦀 ClawHub

video-stt

Extract audio from video URLs and transcribe using STT (Speech-to-Text). Supports local Whisper or cloud APIs. Use when: user provides a video URL and wants...

🦀 ClawHub

Ecomm Ai Voice Agent

Complete AI voice agent system for eCommerce order confirmation, customer support, and outbound campaigns. 12 production-ready n8n workflows with Vapi AI voi...

🦀 ClawHub

Bilibili Up To Kb

Convert Bilibili (B站) videos into a searchable text knowledge base. Supports single videos and batch processing of entire UP主 channels. Uses local whisper.cp...

🦀 ClawHub

AIML Music Generator

Generate high-quality music/songs via AIMLAPI. Supports Suno, Udio, Minimax, and ElevenLabs music models. Use when the user asks for music, songs, or soundtr...

🦀 ClawHub

Phone Caller

Make AI-powered outbound phone calls using ElevenLabs voice + GPT brain + Twilio. Supports one-way pre-recorded messages AND live two-way conversations where...

🦀 ClawHub

Voice (Edge TTS)

Convert text to speech using Microsoft Edge TTS with real-time streaming, customizable voice settings, and support for multiple languages including Chinese a...

🦀 ClawHub

AnveVoice

Add AI voice assistants to your website. Engage visitors with natural voice conversations, capture leads, automate support, and boost conversions.

🦀 ClawHub

LH Edge TTS

Text-to-speech conversion using Python edge-tts for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and sub...

🦀 ClawHub

Youtube Transcript Api

Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch proces...

🦀 ClawHub

ComfyUI TTS

Convert text to speech audio via ComfyUI's Qwen-TTS API, supporting customizable voice, style, model, and output options.

🦀 ClawHub

SiliconFlow TTS Gen

Text-to-Speech using SiliconFlow API (CosyVoice2). Supports multiple voices, languages, and dialects.

🦀 ClawHub

Yt Dlp

A robust CLI wrapper for yt-dlp to download videos, playlists, and audio from YouTube and thousands of other sites. Supports format selection, quality control, metadata embedding, and cookie authentication.

🦀 ClawHub

Seedance Video Generation

Generate AI videos using ByteDance Seedance. Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame, first+last frame, reference images), or (3) query/manage video generation tasks. Supports Seedance 1.5 Pro (with audio), 1.0 Pro, 1.0 Pro Fast, and 1.0 Lite models.

🦀 ClawHub

minimax-tokenplan-tts

Generate speech audio from text using MiniMax speech-2.8-hd model. Supports multiple voice options, speed/pitch/volume control, WAV file output with automati...

🦀 ClawHub

AssemblyAI Transcriber

Transcribe audio files with speaker diarization (who speaks when). Supports 100+ languages, automatic language detection, and timestamps. Use for meetings, interviews, podcasts, or voice messages. Requires AssemblyAI API key.

🦀 ClawHub

Supercall

Make AI-powered phone calls with custom personas and goals. Uses OpenAI Realtime API + Twilio for ultra-low latency voice conversations. Supports DTMF/IVR na...

🦀 ClawHub

Spotify Playlist Curator

Create and refine Spotify playlists using the Spotify Web API, with support for track search, recent and top listening lookups, queueing selected tracks, and...

🦀 ClawHub

Zvukogram

Text-to-Speech via Zvukogram API with SSML support. Use when you need to generate speech from text, create podcasts, voice notifications, or work with audio....

🦀 ClawHub

MiniMax TTS Generator

Text-to-speech (TTS) generation using MiniMax API. Converts text into natural-sounding speech with support for multiple voices, adjustable speed and pitch, a...

← PrevPage 3 / 4 (160 skills)Next →