BytesAgainBytesAgain

Find the Right AI Skill for Any Job

Browse 160+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case β†’Pick My Role

All Skills β€” audio

160 skills in "audio" matching "support"

πŸ¦€ ClawHub
ElevenLabs STT OpenClaw
Transcribe audio files with ElevenLabs Speech-to-Text (Scribe v2) from the local CLI. Supports diarization, events, JSON output, webhooks, and advanced STT o...
πŸ¦€ ClawHub
Invoice Gen
Generate professional PDF invoices from simple text commands. Supports multiple currencies, tax calculation, CJK text, and customizable templates. No externa...
πŸ¦€ ClawHub
Seedance Video Generation BytePlus
Generate AI videos using BytePlus Seedance API (International). Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame, first+last frame, reference images), or (3) query/manage video generation tasks. Supports Seedance 1.5 Pro (with audio & draft mode), 1.0 Pro, 1.0 Pro Fast, and 1.0 Lite models.
πŸ¦€ ClawHub
baml-codegen
Use when generating BAML code for type-safe LLM extraction, classification, RAG, or agent workflows - creates complete .baml files with types, functions, clients, tests, and framework integrations from natural language requirements. Queries official BoundaryML repositories via MCP for real-time patterns. Supports multimodal inputs (images, audio), Python/TypeScript/Ruby/Go, 10+ frameworks, 50-70% token optimization, 95%+ compilation success.
⭐ GitHub
ConvertAnything
The ultimate file converter for images, audio, video, documents and more. It handles individual or batch uploads, supports ZIPs, and provides a download link by [Pietro Schirano](https://x.com/skirano/status/1723026266608033888)
πŸ¦€ ClawHub
Edge TTS
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
πŸ¦€ ClawHub
Zhipu AI TTS
Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chin...
πŸ¦€ ClawHub
yap
Fast on-device speech-to-text transcription on macOS 26+ using Apple Speech.framework, supporting multiple languages and output formats without model downloads.
πŸ¦€ ClawHub
Ai Podcast Pipeline
Create Korean AI podcast packages from QuickView trend notes. Use for dual-host script writing (Callie Γ— Nick), Gemini multi-speaker TTS audio generation, subtitle timing/render fixes, thumbnail+MP4 packaging, and YouTube title/description output. Supports both full (15~20 min) and compressed (5~7 min) editions.
πŸ¦€ ClawHub
ElevenLabs Music
Generate music from text prompts using ElevenLabs Eleven Music API. Use when creating songs, soundtracks, jingles, lullabies, or any audio music from descriptions. Supports vocals with AI-generated lyrics, instrumental tracks, and multiple genres/styles. Requires paid ElevenLabs plan.
πŸ¦€ ClawHub
acestep
Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use thi...
πŸ¦€ ClawHub
MarkItDown Skill
OpenClaw agent skill for converting documents to Markdown. Documentation and utilities for Microsoft's MarkItDown library. Supports PDF, Word, PowerPoint, Excel, images (OCR), audio (transcription), HTML, YouTube.
πŸ¦€ ClawHub
Faster Whisper Local
Local speech-to-text using faster-whisper. High-performance transcription with GPU acceleration support. Includes word-level timestamps and distilled models....
πŸ¦€ ClawHub
it will help you to send voice messages to your AI Assistant and also can make it talk
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
πŸ¦€ ClawHub
Pywayne Tts
Text-to-speech conversion tool. Use when converting text to speech audio files (opus or mp3 format). Supports macOS native 'say' command and Google TTS (gTTS...
πŸ¦€ ClawHub
Tarot from Univoice
A reflective tarot draw for emotional support (presence-first, non-clinical, non-predictive).
πŸ¦€ ClawHub
Github Issue Creator
Convert raw notes, error logs, voice dictation, or screenshots into crisp GitHub-flavored markdown issue reports. Use when the user pastes bug info, error messages, or informal descriptions and wants a structured GitHub issue. Supports images/GIFs for visual evidence.
πŸ¦€ ClawHub
Qwen3-tts
Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.
πŸ¦€ ClawHub
Audio Content Generator
Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and ElevenLabs converts it to high-quality audio. Supports multiple formats (audiobook, podcast, educational), custom lengths, and voice effects. Use when asked to create audio content, make a podcast, generate an audiobook, or produce educational audio. Returns MP3 audio file via MEDIA token.
πŸ¦€ ClawHub
Video Transcript Downloader
Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to β€œdownload this video”, β€œsave this clip”, β€œrip audio”, β€œget subtitles”, β€œget transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.
πŸ¦€ ClawHub
Video Subtitle Generator
Generate and translate video subtitles using WhisperX and LLM translation. Use when processing video files to create .srt subtitle files. Supports multilingu...
πŸ¦€ ClawHub
Simple stt(sound-to-text) locally
Simple local Speech-To-Text using Whisper. One-command install with auto model download. Supports 99+ languages.
πŸ¦€ ClawHub
Music Generation
SenseAudio Music Generation API for creating AI-generated lyrics and songs. Supports lyrics generation, song generation with style/vocal control, and async t...
πŸ¦€ ClawHub
tencent-tts-podcast
Convert text to podcast audio using Tencent Cloud TTS. Supports both short and long text processing, generates up to 30-minute long audio with automatic chun...
πŸ¦€ ClawHub
MY/SG Invoice & Receipt Parser
Extract structured data from Malaysian & Singaporean invoices/receipts. SST/GST-aware. Supports BM/EN/CN.
πŸ¦€ ClawHub
Gemini Assistant
General-purpose AI assistant using Gemini API with voice and text support. Use when you need a smart AI assistant that can answer questions, have conversatio...
πŸ¦€ ClawHub
salute speech
Transcribe audio files using Sber Salute Speech async API. Russian-first STT with support for ru-RU, en-US, kk-KZ, ky-KG, uz-UZ.
πŸ¦€ ClawHub
Clonev
Clone any voice and generate speech using Coqui XTTS v2. SUPER SIMPLE - provide a voice sample (6-30 sec WAV) and text, get cloned voice audio. Supports 14+ languages. Use when the user wants to (1) Clone their voice or someone else's voice, (2) Generate speech that sounds like a specific person, (3) Create personalized voice messages, (4) Multi-lingual voice cloning (speak any language with cloned voice).
πŸ¦€ ClawHub
Veo Skill
Veo, Veo 3.1 Fast - Google AI video generation models for AI agents. 1080p HD output, reference image support, intelligent audio generation.
πŸ¦€ ClawHub
Video Generator | θ§†ι’‘η”Ÿζˆε™¨
Automated text-to-video pipeline with multi-provider TTS/ASR support - OpenAI, Azure, Aliyun, Tencent | ε€šεŽ‚ε•† TTS/ASR ζ”―ζŒηš„θ‡ͺεŠ¨εŒ–ζ–‡ζœ¬θ½¬θ§†ι’‘η³»η»Ÿ
πŸ¦€ ClawHub
Quotation Generator
Auto-generate professional PDF proforma invoices with company letterhead, multi-language support, and post-quote tracking.
πŸ¦€ ClawHub
Elevenlabs Transcribe
Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.
πŸ¦€ ClawHub
when-clock-skill
Control WHEN/WHEN Voice LAN clock devices. Supports voice time announcement, weather broadcast (WHEN Voice only), alarm CRUD, and countdown timer. Use --devi...
πŸ¦€ ClawHub
Akashic Doc Analyzer
Parse, analyze, and extract content from documents (PDF, DOCX, PPTX, audio). Supports OCR, table extraction, and semantic chunking.
πŸ¦€ ClawHub
Voice Recognition
Local speech-to-text with OpenAI Whisper CLI. Supports Chinese, English, 100+ languages with translation and summarization.
πŸ¦€ ClawHub
Step Asr
Transcribe audio files to text via Step ASR streaming API (HTTP SSE). Supports Chinese and English, multiple audio formats (PCM, WAV, MP3, OGG/OPUS), real-ti...
πŸ¦€ ClawHub
Podcast Generation from PDF, Text, and Links
Generate AI podcast episodes from PDFs, text, notes, and links using MagicPodcast in OpenClaw. Creates natural two-person dialogue audio, supports custom lan...
πŸ¦€ ClawHub
SAM TTS
Generate retro robotic speech audio using SAM (Software Automatic Mouth), the classic C64 text-to-speech synthesizer. Use for /sam command to generate voice messages. Supports /sam on/off toggle mode where all responses are spoken in SAM voice. Supports pitch, speed, mouth, and throat parameters for voice customization.
πŸ¦€ ClawHub
Bootleg Link
Download music from YouTube channels/playlists and convert to 320kbps MP3. Supports batch processing, resume interrupted downloads, and concurrent downloading.
πŸ¦€ ClawHub
Play Local Music
Control local music playback with play, pause, resume, stop commands; supports listing and playing specified songs from a configured music directory.
πŸ¦€ ClawHub
Mac TTS
Text-to-speech using macOS built-in `say` command. Use for voice notifications, audio alerts, reading text aloud, or announcing messages through Mac speakers. Supports multiple languages including Chinese (Mandarin), English, Japanese, etc.
πŸ¦€ ClawHub
potplayer
Play local or network audio/video files with PotPlayer, supporting playback control, playlists, fullscreen, subtitles, and device access.
πŸ¦€ ClawHub
ComfyUI Video
Automate AI video generation with ComfyUI and LTX-2.3. Supports text-to-video (T2V), image-to-video (I2V), batch scene rendering for music videos, and multi-...
πŸ¦€ ClawHub
AI Video Gen CN
End-to-end AI video generation - create videos from text prompts using image generation, video synthesis, voice-over, and editing. Supports OpenAI DALL-E, Re...
πŸ¦€ ClawHub
Zhipu Asr
Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio trans...
πŸ”§ Dify
Spotify (Dify)
**Author**: langgenius **Version**: 0.1.1 **Type**: tool This plugin integrates with Spotify, supporting operations such as searching for music, controlling playback, managing playlists, and retrieving detailed information about tracks, albums, and artists. It enables automated music discovery and playback control in platforms like Dify.
πŸ¦€ ClawHub
CosyVoice3 macOS
Local text-to-speech using Alibaba's CosyVoice3 on macOS Apple Silicon. Supports Chinese, English, Japanese, Korean, and 18+ Chinese dialects. Provides zero-...
πŸ”§ Dify
Discord (Dify)
Discord is a communication platform designed for communities. It offers features like text and voice channels, direct messaging, and server-based organization. In Dify, Discord tools allow users to create a random bot with random username and avatar to send messages. Please follow [this site](https://support.discord.com/hc/en-us/articles/228383668-Intro-to-Webhooks) to create a webhook and get its
← PrevPage 2 / 4 (160 skills)Next β†’