BytesAgainBytesAgain

Find the Right AI Skill for Any Job

Browse 2,510+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case →Pick My Role

All Skills — audio

2,510 skills in "audio"

GitHub
How to Use Conferences to Grow Your Startup
podcast by Hiten Shah & Steli Efti.
🦀 ClawHub
MeowMusic YouTube MP3
Package and reuse the MeowMusicServer-patched YouTube fallback workflow: Windows Chrome cookie export/sync to server, server-side yt-dlp/yt-dlp-ejs/ffmpeg se...
🦀 ClawHub
Invoice & Expense Tracker
AI-powered invoice and expense tracking from natural language. Maintain a local ledger, generate monthly reports by category/vendor, export to CSV for QuickB...
🦀 ClawHub
SmartBill Invoicing
Issue SmartBill invoices through the SmartBill.ro API with local automation. Use for SmartBill tasks such as validating invoice payloads, creating invoices,...
🦀 ClawHub
Dizest Summarize
Summarize long-form content — articles, podcasts, research papers, PDFs, notes, and more — using the Dizest API. Turn what you read into structured, searchab...
🦀 ClawHub
Pocket TTS Complete Documentation
Generate speech from text using Kyutai Pocket TTS - lightweight, CPU-friendly, streaming TTS with voice cloning. English only. ~6x real-time on M4 MacBook Air.
🦀 ClawHub
PPT Audio To Video
Convert narration audio plus slide decks into a narrated video. Use when the user has an audio-only `mp4/m4a/mp3/wav` and a `ppt/pptx/pdf` deck, and needs sl...
🦀 ClawHub
Pub Gog
Google Workspace CLI for Gmail, Calendar, Drive, Contacts, Sheets, and Docs. And also 50+ models for image generation, video generation, text-to-speech, spee...
🦀 ClawHub
VEED UGC
Generate UGC-style promotional videos with AI lip-sync. Takes an image (person with product from Morpheus/Ad-Ready) and a script (pure dialogue), creates a video of the person speaking. Uses ElevenLabs for voice synthesis.
🦀 ClawHub
meeting-to-text
Create a fully local speaker-separated .txt transcript from a meeting recording, meeting screen recording, speech audio, or local video/audio file. Use this...
🔌 MCP
tiianhk/MaxMSP-MCP-Server
🐍 🏠 🎵 🎥 - A coding agent for Max (Max/MSP/Jitter), which is a visual programming language for music and multimedia.
🦀 ClawHub
Cinematic Script Writer
Create professional cinematic scripts for AI video generation with character consistency and cinematography knowledge. Use when the user wants to write a cinematic script, create story contexts with characters, generate image prompts for AI video tools (Midjourney, Sora, Veo), or needs cinematography guidance (camera angles, lighting, color grading). Also use for character consistency sheets, voice profiles, anachronism detection, and saving scripts to Google Drive.
🦀 ClawHub
hehe-ddc
抖音视频自动生成 - 图片 + 文案→视频,支持 Edge TTS 男女声、逐行字幕、随机 BGM、智能时长适配
🦀 ClawHub
Play Music from YouTube
Play music on YouTube via browser automation with playwright-cli. Use when the user wants to: (1) play a specific song (e.g. 'play Money Money Money by ABBA') (2) play songs by an artist as a playlist or mix (e.g. 'play Jay Chou's songs') (3) play genre or mood-based music (e.g. 'play relaxing spa music', 'play 60s Chinese oldies') (4) control playback — next, pause, resume, stop, skip ad, change song, close the player. Also handles song/artist name corrections from voice transcription erro
🔌 MCP
format37/youtube_mcp
🐍 ☁️ – MCP server that transcribes YouTube videos to text. Uses yt-dlp to download audio and OpenAI's Whisper-1 for more precise transcription than youtube captions. Provide a YouTube URL and get back the full transcript splitted by chunks for long videos.
🦀 ClawHub
branding
When the user wants to define, audit, or apply brand strategy—purpose, values, positioning, storytelling, voice, narrative (not only visuals). Also use when...
🔌 MCP
leadbrain/korean-data-mcp
[![leadbrain/korean-data-mcp MCP server](https://glama.ai/mcp/servers/leadbrain/korean-data-mcp/badges/score.svg)](https://glama.ai/mcp/servers/leadbrain/korean-data-mcp) 🐍 ☁️ - Real-time Korean web data — Naver place reviews, Melon music chart, Daangn/Bunjang marketplace listings, Naver news, Musin
🔌 MCP
lfnovo/content-core
🐍 🏠 - Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.
🔌 MCP
anaisbetts/mcp-youtube
📇 ☁️ - Fetch YouTube subtitles
🦀 ClawHub
guofeng-lyric-writer
Create traditional Chinese-style (Guofeng) lyrics with classical imagery, vocabulary substitution, rhyme schemes, and poetic techniques. Use when creating Gu...
🦀 ClawHub
SkillBoss
Give your OpenClaw agent access to 354+ tools (100+ LLMs, web scraping, search, image/video/audio generation, email) through one API key with signed JWT audi...
🦀 ClawHub
TTS文字转语音
将输入文字通过免费接口转换为多种风格和语音的MP3音频文件并发送给用户。
🦀 ClawHub
讯飞票据识别
Recognize and extract structured data from invoices, receipts, and bills using iFlytek OCR API (科大讯飞票据识别). Supports VAT invoices, taxi receipts, train ticket...
🦀 ClawHub
Moltbook Authentic Engagement
Authentic engagement protocols for Moltbook — quality over quantity, genuine voice, spam filtering, verification handling, and meaningful community building for AI agents
🦀 ClawHub
Ai Marketing Videos
Create AI marketing videos for ads, promos, product launches, and brand content. Models: Veo, Seedance, Wan, FLUX for visuals, Kokoro for voiceover. Types: p...
🔌 MCP
imprvhub/mcp-claude-spotify
📇 ☁️ 🏠 - An integration that allows Claude Desktop to interact with Spotify using the Model Context Protocol (MCP).
🦀 ClawHub
Aliyun Qwen Tts Voice Design
Use when designing custom voices with Alibaba Cloud Model Studio Qwen TTS VD models. Use when creating custom synthetic voices from text descriptions and usi...
GitHub
AudioGPT
Understanding and Generating Speech, Music, Sound, and Talking Head ![GitHub Repo stars](https://img.shields.io/github/stars/AIGC-Audio/AudioGPT?style=social)
🦀 ClawHub
SiliconFlow 多模态服务,支持图片生成(FLUX/Qwen)、视频生成(Wan)、TTS语音合成、ASR语音识别。使用代金券支付。
SiliconFlow 多模态服务,支持图片生成(FLUX/Qwen)、视频生成(Wan)、TTS语音合成、ASR语音识别。使用代金券支付。
🦀 ClawHub
Invoice Template
Free simple invoice generator. Creates clean, professional invoices with your branding. Use when you need to bill a client quickly without complex tracking o...
🦀 ClawHub
Accessibility Toolkit 1.0.0
Friction-reduction patterns for agents helping humans with disabilities. Voice-first workflows, smart home templates, efficiency automation.
🦀 ClawHub
audio to text and video to text
Transcribe audio and video files into text using OpenAI's Whisper API. Use this skill whenever a user wants to convert any audio or video file to text — incl...
🦀 ClawHub
Transcribe audio files via OpenRouter using audio-capable models
Transcribe audio files via OpenRouter using audio-capable models (Gemini, GPT-4o-audio, etc).
🦀 ClawHub
Brand Voice Writer — AI Content in Your Voice
Generates content in your unique brand voice by analyzing your style, filtering relevant trends, and creating tailored posts, articles, newsletters, and scri...
🦀 ClawHub
Content Repurposer
Turn one piece of content into 10+ formats. Transform blog posts, podcasts, videos, or talks into tweets, LinkedIn posts, newsletters, carousels, and more.
🦀 ClawHub
Github Issue Creator
Convert raw notes, error logs, voice dictation, or screenshots into crisp GitHub-flavored markdown issue reports. Use when the user pastes bug info, error messages, or informal descriptions and wants a structured GitHub issue. Supports images/GIFs for visual evidence.
🦀 ClawHub
MTTSports
Use when the user wants to play or observe MTT poker through the `mttsports` CLI: auth, user balance, room selection and creation, table join or add-on, sess...
🔧 Dify
Stability (Dify)
Stability offers a suite of AI tools and models focused on generative media. It provides capabilities for creating images, audio, and video content from text prompts or other inputs. The suite includes various generative models specializing in different artistic styles and media types. Please apply for an API Key on [Stability.ai](https://platform.stability.ai/account/keys). The Stability tools co
🦀 ClawHub
Kiwi Voice
Manage and configure Kiwi Voice assistant service. Use when starting/stopping Kiwi, editing voice config, checking logs, troubleshooting audio issues, or man...
🦀 ClawHub
中文播客雷达
Discover, compare, and curate trending Chinese podcasts or episodes from 中文播客榜. Use for hot or recent show discovery, creator benchmarking, curation lists, c...
🦀 ClawHub
Dub YouTube with Voice.ai
Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.
🦀 ClawHub
Minimax Music Gyh
MiniMax 音乐生成模型,支持 Music-2.5/Music-2 等模型,根据文本描述生成音乐。使用 MINIMAX_API_KEY 环境变量。
🦀 ClawHub
Minimax Tts Gyh
MiniMax TTS 文字转语音模型,支持 speech-02/speech-01 系列,生成高质量语音。使用 MINIMAX_API_KEY 环境变量。
🦀 ClawHub
Record screen, microphone or camera from macOS terminal
macOS CLI tool to record microphone audio, screen video or screenshot, and camera video or photo from the terminal with device listing and output control.
🦀 ClawHub
MLX Audio Server
Local 24x7 OpenAI-compatible API server for STT/TTS, powered by MLX on your Mac.
🦀 ClawHub
Xpilot Ad Maker
Generate a 30-second cinematic ad video with consistent character, AI narration, brand overlays, and ambient music. Uses Vidu reference-to-video for characte...
🦀 ClawHub
Supercall
Make AI-powered phone calls with custom personas and goals. Uses OpenAI Realtime API + Twilio for ultra-low latency voice conversations. Supports DTMF/IVR na...
🦀 ClawHub
Speech Recognition
通用语音识别 Skill。支持多种音频格式(ogg/mp3/wav/m4a),使用硅基流动 SenseVoice API 进行语音转文字。当用户发送语音消息、音频文件,或需要转录音频时触发。
← PrevPage 48 / 53 (2,510 skills)Next →