Find the Right AI Skill for Any Job

Browse 2,510+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case →Pick My Role

All Skills — audio

2,510 skills in "audio"

⭐ GitHub

How to Use Conferences to Grow Your Startup

podcast by Hiten Shah & Steli Efti.

🦀 ClawHub

MeowMusic YouTube MP3

Package and reuse the MeowMusicServer-patched YouTube fallback workflow: Windows Chrome cookie export/sync to server, server-side yt-dlp/yt-dlp-ejs/ffmpeg se...

🦀 ClawHub

Invoice & Expense Tracker

AI-powered invoice and expense tracking from natural language. Maintain a local ledger, generate monthly reports by category/vendor, export to CSV for QuickB...

🦀 ClawHub

SmartBill Invoicing

Issue SmartBill invoices through the SmartBill.ro API with local automation. Use for SmartBill tasks such as validating invoice payloads, creating invoices,...

🦀 ClawHub

Dizest Summarize

Summarize long-form content — articles, podcasts, research papers, PDFs, notes, and more — using the Dizest API. Turn what you read into structured, searchab...

🦀 ClawHub

Pocket TTS Complete Documentation

Generate speech from text using Kyutai Pocket TTS - lightweight, CPU-friendly, streaming TTS with voice cloning. English only. ~6x real-time on M4 MacBook Air.

🦀 ClawHub

PPT Audio To Video

Convert narration audio plus slide decks into a narrated video. Use when the user has an audio-only `mp4/m4a/mp3/wav` and a `ppt/pptx/pdf` deck, and needs sl...

🦀 ClawHub

Pub Gog

Google Workspace CLI for Gmail, Calendar, Drive, Contacts, Sheets, and Docs. And also 50+ models for image generation, video generation, text-to-speech, spee...

🦀 ClawHub

VEED UGC

Generate UGC-style promotional videos with AI lip-sync. Takes an image (person with product from Morpheus/Ad-Ready) and a script (pure dialogue), creates a video of the person speaking. Uses ElevenLabs for voice synthesis.

🦀 ClawHub

meeting-to-text

Create a fully local speaker-separated .txt transcript from a meeting recording, meeting screen recording, speech audio, or local video/audio file. Use this...

🔌 MCP

tiianhk/MaxMSP-MCP-Server

🐍 🏠 🎵 🎥 - A coding agent for Max (Max/MSP/Jitter), which is a visual programming language for music and multimedia.

🦀 ClawHub

Cinematic Script Writer

Create professional cinematic scripts for AI video generation with character consistency and cinematography knowledge. Use when the user wants to write a cinematic script, create story contexts with characters, generate image prompts for AI video tools (Midjourney, Sora, Veo), or needs cinematography guidance (camera angles, lighting, color grading). Also use for character consistency sheets, voice profiles, anachronism detection, and saving scripts to Google Drive.

🦀 ClawHub

hehe-ddc

抖音视频自动生成 - 图片 + 文案→视频，支持 Edge TTS 男女声、逐行字幕、随机 BGM、智能时长适配

🦀 ClawHub

Play Music from YouTube

Play music on YouTube via browser automation with playwright-cli. Use when the user wants to: (1) play a specific song (e.g. 'play Money Money Money by ABBA') (2) play songs by an artist as a playlist or mix (e.g. 'play Jay Chou's songs') (3) play genre or mood-based music (e.g. 'play relaxing spa music', 'play 60s Chinese oldies') (4) control playback — next, pause, resume, stop, skip ad, change song, close the player. Also handles song/artist name corrections from voice transcription erro

🔌 MCP

format37/youtube_mcp

🐍 ☁️ – MCP server that transcribes YouTube videos to text. Uses yt-dlp to download audio and OpenAI's Whisper-1 for more precise transcription than youtube captions. Provide a YouTube URL and get back the full transcript splitted by chunks for long videos.

🦀 ClawHub

branding

When the user wants to define, audit, or apply brand strategy—purpose, values, positioning, storytelling, voice, narrative (not only visuals). Also use when...

🔌 MCP

leadbrain/korean-data-mcp

[![leadbrain/korean-data-mcp MCP server](https://glama.ai/mcp/servers/leadbrain/korean-data-mcp/badges/score.svg)](https://glama.ai/mcp/servers/leadbrain/korean-data-mcp) 🐍 ☁️ - Real-time Korean web data — Naver place reviews, Melon music chart, Daangn/Bunjang marketplace listings, Naver news, Musin

🔌 MCP

lfnovo/content-core

🐍 🏠 - Extract content from URLs, documents, videos, and audio files using intelligent auto-engine selection. Supports web pages, PDFs, Word docs, YouTube transcripts, and more with structured JSON responses.

🔌 MCP

anaisbetts/mcp-youtube

📇 ☁️ - Fetch YouTube subtitles

🦀 ClawHub

guofeng-lyric-writer

Create traditional Chinese-style (Guofeng) lyrics with classical imagery, vocabulary substitution, rhyme schemes, and poetic techniques. Use when creating Gu...

🦀 ClawHub

SkillBoss

Give your OpenClaw agent access to 354+ tools (100+ LLMs, web scraping, search, image/video/audio generation, email) through one API key with signed JWT audi...

🦀 ClawHub

TTS文字转语音

将输入文字通过免费接口转换为多种风格和语音的MP3音频文件并发送给用户。

🦀 ClawHub

讯飞票据识别

Recognize and extract structured data from invoices, receipts, and bills using iFlytek OCR API (科大讯飞票据识别). Supports VAT invoices, taxi receipts, train ticket...

🦀 ClawHub

Moltbook Authentic Engagement

Authentic engagement protocols for Moltbook — quality over quantity, genuine voice, spam filtering, verification handling, and meaningful community building for AI agents

🦀 ClawHub

Ai Marketing Videos

Create AI marketing videos for ads, promos, product launches, and brand content. Models: Veo, Seedance, Wan, FLUX for visuals, Kokoro for voiceover. Types: p...

🔌 MCP

imprvhub/mcp-claude-spotify

📇 ☁️ 🏠 - An integration that allows Claude Desktop to interact with Spotify using the Model Context Protocol (MCP).

🦀 ClawHub

Aliyun Qwen Tts Voice Design

Use when designing custom voices with Alibaba Cloud Model Studio Qwen TTS VD models. Use when creating custom synthetic voices from text descriptions and usi...

⭐ GitHub

AudioGPT

Understanding and Generating Speech, Music, Sound, and Talking Head ![GitHub Repo stars](https://img.shields.io/github/stars/AIGC-Audio/AudioGPT?style=social)

🦀 ClawHub

SiliconFlow 多模态服务，支持图片生成(FLUX/Qwen)、视频生成(Wan)、TTS语音合成、ASR语音识别。使用代金券支付。

🦀 ClawHub

Invoice Template

Free simple invoice generator. Creates clean, professional invoices with your branding. Use when you need to bill a client quickly without complex tracking o...

🦀 ClawHub

Accessibility Toolkit 1.0.0

Friction-reduction patterns for agents helping humans with disabilities. Voice-first workflows, smart home templates, efficiency automation.

🦀 ClawHub

audio to text and video to text

Transcribe audio and video files into text using OpenAI's Whisper API. Use this skill whenever a user wants to convert any audio or video file to text — incl...

🦀 ClawHub

Transcribe audio files via OpenRouter using audio-capable models

Transcribe audio files via OpenRouter using audio-capable models (Gemini, GPT-4o-audio, etc).

🦀 ClawHub

Brand Voice Writer — AI Content in Your Voice

Generates content in your unique brand voice by analyzing your style, filtering relevant trends, and creating tailored posts, articles, newsletters, and scri...

🦀 ClawHub

Content Repurposer

Turn one piece of content into 10+ formats. Transform blog posts, podcasts, videos, or talks into tweets, LinkedIn posts, newsletters, carousels, and more.

🦀 ClawHub

Github Issue Creator

Convert raw notes, error logs, voice dictation, or screenshots into crisp GitHub-flavored markdown issue reports. Use when the user pastes bug info, error messages, or informal descriptions and wants a structured GitHub issue. Supports images/GIFs for visual evidence.

🦀 ClawHub

MTTSports

Use when the user wants to play or observe MTT poker through the `mttsports` CLI: auth, user balance, room selection and creation, table join or add-on, sess...

🔧 Dify

Stability (Dify)

Stability offers a suite of AI tools and models focused on generative media. It provides capabilities for creating images, audio, and video content from text prompts or other inputs. The suite includes various generative models specializing in different artistic styles and media types. Please apply for an API Key on [Stability.ai](https://platform.stability.ai/account/keys). The Stability tools co

🦀 ClawHub

Kiwi Voice

Manage and configure Kiwi Voice assistant service. Use when starting/stopping Kiwi, editing voice config, checking logs, troubleshooting audio issues, or man...

🦀 ClawHub

中文播客雷达

Discover, compare, and curate trending Chinese podcasts or episodes from 中文播客榜. Use for hot or recent show discovery, creator benchmarking, curation lists, c...

🦀 ClawHub

Dub YouTube with Voice.ai

Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.

🦀 ClawHub

Minimax Music Gyh

MiniMax 音乐生成模型，支持 Music-2.5/Music-2 等模型，根据文本描述生成音乐。使用 MINIMAX_API_KEY 环境变量。

🦀 ClawHub

Minimax Tts Gyh

MiniMax TTS 文字转语音模型，支持 speech-02/speech-01 系列，生成高质量语音。使用 MINIMAX_API_KEY 环境变量。

🦀 ClawHub

Record screen, microphone or camera from macOS terminal

macOS CLI tool to record microphone audio, screen video or screenshot, and camera video or photo from the terminal with device listing and output control.

🦀 ClawHub

MLX Audio Server

Local 24x7 OpenAI-compatible API server for STT/TTS, powered by MLX on your Mac.

🦀 ClawHub

Xpilot Ad Maker

Generate a 30-second cinematic ad video with consistent character, AI narration, brand overlays, and ambient music. Uses Vidu reference-to-video for characte...

🦀 ClawHub

Supercall

Make AI-powered phone calls with custom personas and goals. Uses OpenAI Realtime API + Twilio for ultra-low latency voice conversations. Supports DTMF/IVR na...

🦀 ClawHub

Speech Recognition

通用语音识别 Skill。支持多种音频格式（ogg/mp3/wav/m4a），使用硅基流动 SenseVoice API 进行语音转文字。当用户发送语音消息、音频文件，或需要转录音频时触发。

← PrevPage 48 / 53 (2,510 skills)Next →