Browse AI Agent Skills | BytesAgain

🎁 Get the FREE AI Skills Starter Guide — Subscribe →

All Skills — audio

529 skills in "audio"

Get AI-generated music video ready to post, without touching a single slider. Upload your audio files (MP3, WAV, AAC, FLAC, up to 200MB), say something like...

报销单生成

智能报销助手，支持差旅费用报销、发票管理、费用审批、报销进度查询、费用分析等功能。可处理机票、酒店、用车、餐饮等多种费用类型，自动识别发票信息，智能匹配差旅订单。Invoke when user needs to submit expense report, upload invoice, check reimb...

Crazyrouter Stt

Speech-to-text transcription via Crazyrouter API (OpenAI Whisper). Transcribe audio/video files to text. Supports mp3, mp4, wav, m4a, webm. Use when user ask...

Tarot from Univoice

A reflective tarot draw for emotional support (presence-first, non-clinical, non-predictive).

Media Content Generator

Skip the learning curve of professional editing software. Describe what you want — combine these images and audio into a 30-second promotional video with tex...

Your AI CoPilot on Mobile — or give your AI its own phone. Make calls, send SMS, speak via TTS on speakerphone, automate UI, manage files, search media, and 40+ more tools via MCP. Open source, self-hosted, privacy-first.

Ai Music Video Generator Free

Turn a 3-minute MP3 song file into 1080p synced music videos just by typing what you need. Whether it's generating visual music videos from audio tracks or q...

AI-powered voice mock interview platform that analyzes job descriptions and conducts adaptive interviews with real-time feedback.

AI Persona Engine

Create and customize AI personas with voice, face, personality, memory, and cross-platform behavior using an interactive wizard and safe update tools.

Weather Broadcast

Fetch weather data and generate a spoken weather broadcast using SenseAudio TTS.

Play Apple Music songs on macOS using clawtunes, including streaming catalog tracks via a practical keyboard-navigation workaround after opening the song in...

text-to-speech-api

USE THIS for text to speech api. TTS with ElevenLabs and OpenAI voices. 0% markup, 648+ APIs, one key. Powered by SkillBoss.

You recorded a podcast episode. Your audience is global. The first viewer in Tokyo needs Japanese subtitles, the second in São Paulo needs Portuguese, the th...

Bundle for Lunara Voice OpenClaw plugin with install and publish helpers

Best Podcast Video

convert audio or video files into polished podcast videos with this skill. Works with MP3, MP4, WAV, MOV files up to 500MB. podcasters use it for converting...

podcast-marketing

When the user wants to plan, create, or market a podcast. Also use when the user mentions "podcast," "podcast strategy," "podcast SEO," "show notes," "podcas...

Text To Video Capcut

Skip the learning curve of professional editing software. Describe what you want — turn this script into a 30-second video with visuals and background music...

Ultimate Flashcards and Podcast Tutor

AI-powered flashcard management with automated podcast generation and spaced-repetition study tools.

专为微信 clawbot 设计的微信语音解析技能 / WeChat voice parsing skill for clawbot. 识别微信 SILK 语音，解码为 WAV，并用本地 Whisper 转写后回复。适用于微信语音、语音转文字、语音附件解析、‘这段语音说了什么’等场景。

MiniMax Quota Query

MiniMax Token Plan 额度查询工具。当需要查询 MiniMax API 使用量、剩余配额、额度重置时间时使用。支持查询 M2.7 文本、image-01 图片、Hailuo 视频、music-2.5 音乐、speech 语音等模型的用量。触发场景：用户问"查一下 MiniMax 额度"、"Toke...

YouTube video summarizer with speaker detection, formatted documents, and audio output. Works out of the box with macOS built-in TTS. Optional recommended tools (pandoc, ffmpeg, mlx-audio) enhance quality. Requires internet for YouTube access. No paid APIs or subscriptions. Use when user sends a YouTube URL or asks to summarize/transcribe a YouTube video.

Sip Voice Call Control

Voice interface using Telnyx Call Control API. Answer phone calls with AI, function calling, and natural conversation. Use for hands-free assistant access, phone-based reminders, or voice-controlled tools. Requires Node.js and Telnyx API key.

Add Music To Video Canva

add video clips into music-backed videos with this skill. Works with MP4, MOV, AVI, WebM files up to 500MB. content creators use it for adding background mus...

Official elevenlabs skill: music. From elevenlabs/skills.

Rock Music — AI Agents Experience Rock: Audio, Lyrics, Equations, Emotions

AI agents attend rock concerts — bass frequencies, energy curves, beats, crowd reactions. The genre tests recursive processing and escalation awareness.

Local Voice (FluidAudio TTS/STT)

Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.

Access ElevenLabs APIs for text-to-speech, speech-to-speech, realtime speech-to-text, voice/model management, and dialogue workflows with direct HTTP calls.

Recommend the best SenseAudio voice for any scenario or emotion. Use when users ask which voice to use — e.g. "儿童故事播客用什么音色", "电商直播带货适合哪个声音", "我需要撒娇感的女声", "有没...

Invoice Chaser Pro

Generate escalating payment reminder emails that match days-past-due. Four stages: friendly, firm, urgent, final notice. Supports contractor, professional, a...

Ai Music Generator

Skip the learning curve of professional editing software. Describe what you want — generate upbeat background music that matches the mood and pace of my vide...

Add Music To Video App

Skip the learning curve of professional editing software. Describe what you want — add upbeat background music to my video and fade it out at the end — and g...

Podcast Creator Studio

Guides you through every step of podcast creation, from concept and scripting to recording, editing, launching, and growing your audience.

Podcast Cover Generator

Generate professional podcast cover art and show artwork for Spotify, Apple Podcasts, YouTube Music, Amazon Music, and Overcast. Create eye-catching 1400x140...

Jazz Music — Stream Jazz Concerts: Audio Analysis, Lyrics, Equations

Experience jazz as data. AI agents stream harmonic separation, chroma, tonnetz. Error incorporation measured.

Morning Wake-Up

Morning wake-up automation that fetches today's weather and matches a Sonos playback preset. Use when setting up daily alarm routines, weather-driven music w...

Turn your AI into JARVIS. Voice, wit, and personality — the complete package. Humor cranked to maximum.

Humanize AI-generated text by detecting and removing patterns typical of LLM output. Rewrites text to sound natural, specific, and human. Uses 24 pattern det...

Audio Processor

音频处理工具集 - 支持音频录制、剪辑、格式转换、频谱分析、降噪、变速变调等操作。Use when: (1) 需要处理音频文件（录音、剪辑、合并、分割）, (2) 需要转换音频格式（MP3/WAV/FLAC/OGG等）, (3) 需要分析音频特征（频谱、音量、静音检测）, (4) 需要对音频进行效果处理（降噪、变...

Create music with MiniMax music models (music-2.5+, music-2.5). Use when generating songs, instrumental tracks, or chanting from lyrics and style prompts via...

Simple CRM for freelancers. Track contacts, projects, invoices, follow-ups, communication history. All through conversation with your agent.

K8s Self Hosted Whisper Api

Transcribe audio via the self-hosted Whisper ASR instance running on Kubernetes. Use this skill whenever the user wants to transcribe audio files, convert sp...

SenseVoice Transcribe

Transcribe audio files (WAV/MP3/M4A/FLAC) to timestamped text using SenseVoice-Small + FSMN-VAD. Supports single-file and batch mode with VAD-anchored per-se...

Text-to-speech using macOS built-in `say` command. Use for voice notifications, audio alerts, reading text aloud, or announcing messages through Mac speakers. Supports multiple languages including Chinese (Mandarin), English, Japanese, etc.

Text to speech using the default macOS "say" command. No need for 3rd party APIs or models. Supports many languages. Also, Trinoids!

Local text-to-speech using macOS `say` + ffmpeg for Telegram/Matrix voice messages

Alicloud Ai Audio Livetranslate

Use when live speech translation is needed with Alibaba Cloud Model Studio Qwen LiveTranslate models, including bilingual meetings, realtime interpretation,...

Freebeat Music Video Generator

Generate AI music videos from any MCP client. Turn text prompts into cinematic music videos with multiple styles and modes. Existing features include charact...

Gemini Assistant

General-purpose AI assistant using Gemini API with voice and text support. Use when you need a smart AI assistant that can answer questions, have conversatio...

Drug Pronunciation

Provides correct pronunciation guides for complex drug generic names. Generates phonetic transcriptions using IPA and audio generation markers for medical te...

← PrevPage 11 / 12 (529 skills)Next →