Find the Right AI Skill for Any Job

Browse 160+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

All Skills — audio

160 skills in "audio" matching "support"

Transcribe audio files with ElevenLabs Speech-to-Text (Scribe v2) from the local CLI. Supports diarization, events, JSON output, webhooks, and advanced STT o...

🦀 ClawHub

Invoice Gen

Generate professional PDF invoices from simple text commands. Supports multiple currencies, tax calculation, CJK text, and customizable templates. No externa...

🦀 ClawHub

Seedance Video Generation BytePlus

Generate AI videos using BytePlus Seedance API (International). Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame, first+last frame, reference images), or (3) query/manage video generation tasks. Supports Seedance 1.5 Pro (with audio & draft mode), 1.0 Pro, 1.0 Pro Fast, and 1.0 Lite models.

🦀 ClawHub

baml-codegen

Use when generating BAML code for type-safe LLM extraction, classification, RAG, or agent workflows - creates complete .baml files with types, functions, clients, tests, and framework integrations from natural language requirements. Queries official BoundaryML repositories via MCP for real-time patterns. Supports multimodal inputs (images, audio), Python/TypeScript/Ruby/Go, 10+ frameworks, 50-70% token optimization, 95%+ compilation success.

⭐ GitHub

ConvertAnything

The ultimate file converter for images, audio, video, documents and more. It handles individual or batch uploads, supports ZIPs, and provides a download link by [Pietro Schirano](https://x.com/skirano/status/1723026266608033888)

🦀 ClawHub

Edge TTS

Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.

🦀 ClawHub

Zhipu AI TTS

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chin...

🦀 ClawHub

yap

Fast on-device speech-to-text transcription on macOS 26+ using Apple Speech.framework, supporting multiple languages and output formats without model downloads.

🦀 ClawHub

Ai Podcast Pipeline

Create Korean AI podcast packages from QuickView trend notes. Use for dual-host script writing (Callie × Nick), Gemini multi-speaker TTS audio generation, subtitle timing/render fixes, thumbnail+MP4 packaging, and YouTube title/description output. Supports both full (15~20 min) and compressed (5~7 min) editions.

🦀 ClawHub

ElevenLabs Music

Generate music from text prompts using ElevenLabs Eleven Music API. Use when creating songs, soundtracks, jingles, lullabies, or any audio music from descriptions. Supports vocals with AI-generated lyrics, instrumental tracks, and multiple genres/styles. Requires paid ElevenLabs plan.

🦀 ClawHub

acestep

Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use thi...

🦀 ClawHub

MarkItDown Skill

OpenClaw agent skill for converting documents to Markdown. Documentation and utilities for Microsoft's MarkItDown library. Supports PDF, Word, PowerPoint, Excel, images (OCR), audio (transcription), HTML, YouTube.

🦀 ClawHub

Faster Whisper Local

Local speech-to-text using faster-whisper. High-performance transcription with GPU acceleration support. Includes word-level timestamps and distilled models....

🦀 ClawHub

it will help you to send voice messages to your AI Assistant and also can make it talk

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

🦀 ClawHub

Pywayne Tts

Text-to-speech conversion tool. Use when converting text to speech audio files (opus or mp3 format). Supports macOS native 'say' command and Google TTS (gTTS...

🦀 ClawHub

Tarot from Univoice

A reflective tarot draw for emotional support (presence-first, non-clinical, non-predictive).

🦀 ClawHub

Github Issue Creator

Convert raw notes, error logs, voice dictation, or screenshots into crisp GitHub-flavored markdown issue reports. Use when the user pastes bug info, error messages, or informal descriptions and wants a structured GitHub issue. Supports images/GIFs for visual evidence.

🦀 ClawHub

Qwen3-tts

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

🦀 ClawHub

Audio Content Generator

Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and ElevenLabs converts it to high-quality audio. Supports multiple formats (audiobook, podcast, educational), custom lengths, and voice effects. Use when asked to create audio content, make a podcast, generate an audiobook, or produce educational audio. Returns MP3 audio file via MEDIA token.

🦀 ClawHub

Video Transcript Downloader

Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.

🦀 ClawHub

Video Subtitle Generator

Generate and translate video subtitles using WhisperX and LLM translation. Use when processing video files to create .srt subtitle files. Supports multilingu...

🦀 ClawHub

Simple stt(sound-to-text) locally

Simple local Speech-To-Text using Whisper. One-command install with auto model download. Supports 99+ languages.

🦀 ClawHub

Music Generation

SenseAudio Music Generation API for creating AI-generated lyrics and songs. Supports lyrics generation, song generation with style/vocal control, and async t...

🦀 ClawHub

tencent-tts-podcast

Convert text to podcast audio using Tencent Cloud TTS. Supports both short and long text processing, generates up to 30-minute long audio with automatic chun...

🦀 ClawHub

MY/SG Invoice & Receipt Parser

Extract structured data from Malaysian & Singaporean invoices/receipts. SST/GST-aware. Supports BM/EN/CN.

🦀 ClawHub

Gemini Assistant

General-purpose AI assistant using Gemini API with voice and text support. Use when you need a smart AI assistant that can answer questions, have conversatio...

🦀 ClawHub

salute speech

Transcribe audio files using Sber Salute Speech async API. Russian-first STT with support for ru-RU, en-US, kk-KZ, ky-KG, uz-UZ.

🦀 ClawHub

Clonev

Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).

🦀 ClawHub

Veo Skill

Veo, Veo 3.1 Fast - Google AI video generation models for AI agents. 1080p HD output, reference image support, intelligent audio generation.

🦀 ClawHub

Video Generator | 视频生成器

Automated text-to-video pipeline with multi-provider TTS/ASR support - OpenAI, Azure, Aliyun, Tencent | 多厂商 TTS/ASR 支持的自动化文本转视频系统

🦀 ClawHub

Quotation Generator

Auto-generate professional PDF proforma invoices with company letterhead, multi-language support, and post-quote tracking.

🦀 ClawHub

Elevenlabs Transcribe

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

🦀 ClawHub

when-clock-skill

Control WHEN/WHEN Voice LAN clock devices. Supports voice time announcement, weather broadcast (WHEN Voice only), alarm CRUD, and countdown timer. Use --devi...

🦀 ClawHub

Akashic Doc Analyzer

Parse, analyze, and extract content from documents (PDF, DOCX, PPTX, audio). Supports OCR, table extraction, and semantic chunking.

🦀 ClawHub

Voice Recognition

Local speech-to-text with OpenAI Whisper CLI. Supports Chinese, English, 100+ languages with translation and summarization.

🦀 ClawHub

Step Asr

Transcribe audio files to text via Step ASR streaming API (HTTP SSE). Supports Chinese and English, multiple audio formats (PCM, WAV, MP3, OGG/OPUS), real-ti...

🦀 ClawHub

Podcast Generation from PDF, Text, and Links

Generate AI podcast episodes from PDFs, text, notes, and links using MagicPodcast in OpenClaw. Creates natural two-person dialogue audio, supports custom lan...

🦀 ClawHub

SAM TTS

Generate retro robotic speech audio using SAM (Software Automatic Mouth), the classic C64 text-to-speech synthesizer. Use for /sam command to generate voice messages. Supports /sam on/off toggle mode where all responses are spoken in SAM voice. Supports pitch, speed, mouth, and throat parameters for voice customization.

🦀 ClawHub

Bootleg Link

Download music from YouTube channels/playlists and convert to 320kbps MP3. Supports batch processing, resume interrupted downloads, and concurrent downloading.

🦀 ClawHub

Play Local Music

Control local music playback with play, pause, resume, stop commands; supports listing and playing specified songs from a configured music directory.

🦀 ClawHub

Mac TTS

Text-to-speech using macOS built-in `say` command. Use for voice notifications, audio alerts, reading text aloud, or announcing messages through Mac speakers. Supports multiple languages including Chinese (Mandarin), English, Japanese, etc.

🦀 ClawHub

potplayer

Play local or network audio/video files with PotPlayer, supporting playback control, playlists, fullscreen, subtitles, and device access.

🦀 ClawHub

ComfyUI Video

Automate AI video generation with ComfyUI and LTX-2.3. Supports text-to-video (T2V), image-to-video (I2V), batch scene rendering for music videos, and multi-...

🦀 ClawHub

AI Video Gen CN

End-to-end AI video generation - create videos from text prompts using image generation, video synthesis, voice-over, and editing. Supports OpenAI DALL-E, Re...

🦀 ClawHub

Zhipu Asr

Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio trans...

🔧 Dify

Spotify (Dify)

**Author**: langgenius **Version**: 0.1.1 **Type**: tool This plugin integrates with Spotify, supporting operations such as searching for music, controlling playback, managing playlists, and retrieving detailed information about tracks, albums, and artists. It enables automated music discovery and playback control in platforms like Dify.

🦀 ClawHub

CosyVoice3 macOS

Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon. Supports Chinese, English, Japanese, Korean, and 18+ Chinese dialects. Provides zero-...

🔧 Dify

Discord (Dify)

Discord is a communication platform designed for communities. It offers features like text and voice channels, direct messaging, and server-based organization. In Dify, Discord tools allow users to create a random bot with random username and avatar to send messages. Please follow [this site](https://support.discord.com/hc/en-us/articles/228383668-Intro-to-Webhooks) to create a webhook and get its

← PrevPage 2 / 4 (160 skills)Next →