BytesAgainBytesAgain

Find the Right AI Skill for Any Job

Browse 160+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case →Pick My Role

All Skills — audio

160 skills in "audio" matching "support"

🦀 ClawHub
PDF Text Extractor
Extract text from PDFs with OCR support. Perfect for digitizing documents, processing invoices, or analyzing content. Zero dependencies required.
🦀 ClawHub
keevx-video-translate
Translate videos into a specified target language using the Keevx API. Supports audio-only translation, subtitle generation, and dynamic duration adjustment....
🦀 ClawHub
Meta Video Ad Analyzer
Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.
🦀 ClawHub
Talking Circle
Create animated talking-circle videos (Telegram-style round video messages) from avatar frame images and audio. Supports audio-to-video and text-to-video via...
🦀 ClawHub
Agentic Calling
Enable AI agents to autonomously make, receive, transcribe, route, and record phone calls using Twilio with customizable voice messages and IVR support.
🦀 ClawHub
xeon_tts
Local TTS skill using OpenVINO Qwen3-TTS for voice cloning and emotion style synthesis, supporting QQBOT workflows with strict audio length and file retentio...
🦀 ClawHub
ifly-ocr-invoice
Recognize and extract structured data from invoices, receipts, and bills using iFlytek OCR API (科大讯飞票据识别). Supports VAT invoices, taxi receipts, train ticket...
🦀 ClawHub
ifly-speed-transcription
Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C...
🔌 MCP
lfnovo/content-core
🐍 🏠 - Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.
🦀 ClawHub
讯飞票据识别
Recognize and extract structured data from invoices, receipts, and bills using iFlytek OCR API (科大讯飞票据识别). Supports VAT invoices, taxi receipts, train ticket...
🦀 ClawHub
Aliyun Asr
Pure Aliyun ASR skill for voice message transcription, supports multiple channels including Feishu
🦀 ClawHub
keevx-image-to-video
Convert images to videos using Keevx API with support for multiple models, resolutions up to 4K, audio generation, and batch processing.
🦀 ClawHub
Invoice-Recognition
Extract invoice information from images and PDF files using Baidu OCR API, export to Excel. Supports single file, multiple files, or entire directory process...
🦀 ClawHub
Vision Recognition Ocr
Vehicle/animal/plant recognition plus OCR for screenshots, photos, invoices, and tables. Use when users ask 识别车型/看图识别/提取文字/OCR. Supports local path, URL, and...
🦀 ClawHub
ifly-voiceclone-tts
iFlytek Voice Clone tts(声音复刻) — train a custom voice model from audio samples and synthesize speech with the cloned voice. Supports the full workflow: get tr...
🦀 ClawHub
Baidu Speech Synthesis
Baidu Intelligent Cloud Speech Synthesis (TTS), supporting multi-role dialogue audio generation, SSML/segment-merge dual modes, speech rate/pitch adjustment.
🦀 ClawHub
Claw Use Android
Control and interact with real Android phones via HTTP and CLI without ADB or root, supporting screen reading, taps, typing, apps, calls, and voice.
🦀 ClawHub
muapi-media-generation
Generate AI images, videos, music, and audio from the terminal via muapi.ai — supports 100+ models including Flux, Midjourney v7, Kling 3.0, Veo3, and Suno V5
🦀 ClawHub
GrabGrab
Use when the user wants to download a video or audio from a URL. Supports 20+ platforms including YouTube, X/Twitter, TikTok, Instagram, Facebook, Reddit, Bi...
GitHub
Pinepods
A rust based podcast management system with multi-user support. Pinepods utilizes a central database so aspects like listen time and themes follow from device to device. With clients built using Tauri, it's a full cross-platform listening solution! [![Docker Container Build](https://github.com/madeo
GitHub
lifthrasiir/angolmois-rust
A minimalistic music video game which supports the BMS format
🦀 ClawHub
Douyin Video Transcribe
Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi...
🦀 ClawHub
SenseVoice Transcribe
Transcribe audio files (WAV/MP3/M4A/FLAC) to timestamped text using SenseVoice-Small + FSMN-VAD. Supports single-file and batch mode with VAD-anchored per-se...
GitHub
pdeljanov/Symphonia
Audio decoding and media demuxing library supporting AAC, FLAC, MP3, MP4, OGG, Vorbis, and WAV.
🦀 ClawHub
U2-audio-file-transcriber
Transcribe audio files via UniCloud ASR (云知声语音识别, recorded audio → text) API from UniSound. Supports multiple formats, optimized for finance, customer servic...
🦀 ClawHub
Bailian Studio
Call Aliyun Bailian via DashScope; support OCR, TTS, text-to-image and image-to-image.
🦀 ClawHub
LTX-2.3 Video API
Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use wh...
🦀 ClawHub
xeon_asr
Automatically converts received voice messages to text via an external ASR service, supporting multiple audio formats and integrating with OpenClaw.
🦀 ClawHub
spotify-control
macOS Spotify control skill for OpenClaw. Supports playback, volume, position, and metadata retrieval via AppleScript.
🦀 ClawHub
video-stt
Extract audio from video URLs and transcribe using STT (Speech-to-Text). Supports local Whisper or cloud APIs. Use when: user provides a video URL and wants...
🦀 ClawHub
Ecomm Ai Voice Agent
Complete AI voice agent system for eCommerce order confirmation, customer support, and outbound campaigns. 12 production-ready n8n workflows with Vapi AI voi...
🦀 ClawHub
Bilibili Up To Kb
Convert Bilibili (B站) videos into a searchable text knowledge base. Supports single videos and batch processing of entire UP主 channels. Uses local whisper.cp...
🦀 ClawHub
AIML Music Generator
Generate high-quality music/songs via AIMLAPI. Supports Suno, Udio, Minimax, and ElevenLabs music models. Use when the user asks for music, songs, or soundtr...
🦀 ClawHub
Phone Caller
Make AI-powered outbound phone calls using ElevenLabs voice + GPT brain + Twilio. Supports one-way pre-recorded messages AND live two-way conversations where...
🦀 ClawHub
Voice (Edge TTS)
Convert text to speech using Microsoft Edge TTS with real-time streaming, customizable voice settings, and support for multiple languages including Chinese a...
🦀 ClawHub
AnveVoice
Add AI voice assistants to your website. Engage visitors with natural voice conversations, capture leads, automate support, and boost conversions.
🦀 ClawHub
LH Edge TTS
Text-to-speech conversion using Python edge-tts for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and sub...
🦀 ClawHub
Youtube Transcript Api
Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch proces...
🦀 ClawHub
ComfyUI TTS
Convert text to speech audio via ComfyUI's Qwen-TTS API, supporting customizable voice, style, model, and output options.
🦀 ClawHub
SiliconFlow TTS Gen
Text-to-Speech using SiliconFlow API (CosyVoice2). Supports multiple voices, languages, and dialects.
🦀 ClawHub
Yt Dlp
A robust CLI wrapper for yt-dlp to download videos, playlists, and audio from YouTube and thousands of other sites. Supports format selection, quality control, metadata embedding, and cookie authentication.
🦀 ClawHub
Seedance Video Generation
Generate AI videos using ByteDance Seedance. Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame, first+last frame, reference images), or (3) query/manage video generation tasks. Supports Seedance 1.5 Pro (with audio), 1.0 Pro, 1.0 Pro Fast, and 1.0 Lite models.
🦀 ClawHub
minimax-tokenplan-tts
Generate speech audio from text using MiniMax speech-2.8-hd model. Supports multiple voice options, speed/pitch/volume control, WAV file output with automati...
🦀 ClawHub
AssemblyAI Transcriber
Transcribe audio files with speaker diarization (who speaks when). Supports 100+ languages, automatic language detection, and timestamps. Use for meetings, interviews, podcasts, or voice messages. Requires AssemblyAI API key.
🦀 ClawHub
Supercall
Make AI-powered phone calls with custom personas and goals. Uses OpenAI Realtime API + Twilio for ultra-low latency voice conversations. Supports DTMF/IVR na...
🦀 ClawHub
Spotify Playlist Curator
Create and refine Spotify playlists using the Spotify Web API, with support for track search, recent and top listening lookups, queueing selected tracks, and...
🦀 ClawHub
Zvukogram
Text-to-Speech via Zvukogram API with SSML support. Use when you need to generate speech from text, create podcasts, voice notifications, or work with audio....
🦀 ClawHub
MiniMax TTS Generator
Text-to-speech (TTS) generation using MiniMax API. Converts text into natural-sounding speech with support for multiple voices, adjustable speed and pitch, a...
← PrevPage 3 / 4 (160 skills)Next →