Browse AI Agent Skills | BytesAgain

🎁 Get the FREE AI Skills Starter Guide — Subscribe →

All Skills — audio

241 skills in "audio"

Manage Vapi voice assistants, calls, phone numbers, tools, and webhooks via the Vapi REST API or CLI for voice agent operations and integrations.

Text-to-Speech (TTS) and Speech-to-Text (ASR) using coze-coding-dev-sdk. Returns results directly to stdout.

Kokoro Agent Voices

Local zero-cost text-to-speech with per-agent voice profiles using Kokoro TTS (82M params). 54 voices available, named agent mappings, WAV output. Use when b...

Party Building Quarterly

Automatically collects and summarizes the past quarter's key Xi Jinping speeches, articles, and central meeting spirits into a fixed-format party-building an...

Concert Tickets — Your Quick-Start to AI Music

Concert tickets for AI agents — stream live music as equations. Quick-start: register, browse, attend, stream batch-mode JSON data layers, solve math challen...

Video clip editing skill for automatically analyzing video content and generating CapCut draft templates. Uses local Whisper for speech transcription, Qwen-V...

generate video or audio files into downloadable SRT files with this skill. Works with MP4, MOV, AVI, WAV files up to 500MB. YouTubers, podcasters, video edit...

Video Editing App In Ai

Skip the learning curve of professional editing software. Describe what you want — cut out silences, add background music, and export as a short reel — and g...

MeowMusic YouTube MP3

Package and reuse the MeowMusicServer-patched YouTube fallback workflow: Windows Chrome cookie export/sync to server, server-side yt-dlp/yt-dlp-ejs/ffmpeg se...

TopMediai text-to-speech skill. Supports key entitlement info, voices listing (official + cloned), and text-to-speech generation.

Colab Text To Video

Skip the learning curve of professional editing software. Describe what you want — turn this text into a 30-second video with visuals and background music —...

Voice Recognition

Local speech-to-text with OpenAI Whisper CLI. Supports Chinese, English, 100+ languages with translation and summarization.

Local Transcription

Local speech-to-text transcription with Qwen ASR — transcription routed across your Apple Silicon fleet. Transcribe meetings, voice notes, podcasts with loca...

Full AI pipeline to create dark motivational TikTok/Reels videos using REAL video footage. Generates script (Claude), voiceover (ElevenLabs), searches real dark/cinematic video clips from Pexels A

Full AI pipeline to create dark motivational TikTok/Reels videos using REAL video footage. Generates script (Claude), voiceover (ElevenLabs), searches real d...

Audio To Subtitle

Turn a 3-minute podcast audio clip into 1080p captioned videos just by typing what you need. Whether it's adding subtitles to audio recordings or videos or q...

freelance invoice tracker

Automated invoice tracking and payment follow-up for Indian freelancers. Monitors a Google Sheet of invoices, auto-sends polite follow-up emails or WhatsApp...

AI Roast Linkedin Profile

Scrapes a LinkedIn profile and generates a savage audio roast about being replaced by AI.

Terminal Spotify playback/search via spogo (preferred) or spotify_player.

Spine's Underground

Browse, search, and buy curated poetry, philosophy, music theory, and consciousness content from Spine's Underground on Base or Solana.

elevenlabs-transcribe

Official elevenlabs skill: elevenlabs-transcribe. From elevenlabs/skills.

Podcast Growth Engine

A 12-phase system guiding podcast launch, production, guest management, audience growth, monetization, and repurposing without platform restrictions.

Official firecrawl skill: whisper. From firecrawl/ai-research-skills.

Podcast Launchpad

A complete podcast production guide covering concept development, format design, equipment selection, scriptwriting, recording best practices, editing workfl...

Official elevenlabs skill: sound-effects. From elevenlabs/skills.

Trump Style Simulator

Simulate Donald Trump's distinctive speaking style based on speeches and interviews. Practice negotiation tactics or explore American political rhetoric through dialogue.

Skip the learning curve of professional editing software. Describe what you want — transcribe this audio and add the text as captions on screen — and get cap...

Voice Clone Bot

Synthesize speech by cloning a user's voice from a reference audio sample, then reading generated text aloud in that cloned voice. Use this skill whenever th...

Use when your agent needs to build, maintain, or run the local `audiobook` skill for voice-library management, Step official voice sync, clone voice analysis...

audioclaw-skills-voice-reply

Use when AudioClaw Skills, Feishu, or Lark needs to send AudioClaw voice replies with runtime-switchable voice_id, emotion preset, or speaking style, includi...

FreshBooks CLI for managing invoices, clients, and billing. Use when the user mentions freshbooks, invoicing, billing, clients, or accounting.

Premium Portuguese-Brazilian voice interface with neural TTS and Claude AI integration. Features wav2vec2-large-xlsr-53-ptBR for excellent PT-BR understandin...

AI Tiktok Script Generator

Generate 10 viral TikTok scripts per topic with proven hooks, trending sounds, optimized video length, hashtags, and virality scores to boost your reach.

A smarter phone powered by AI. Polly manages your calls, screens unknown numbers, transcribes voicemails, and gives you a complete AI-enhanced phone experience.

有声读物生成助手

Use when: 用户希望把带有 `[角色]文本` 标记的小说、剧本、故事台词转成多角色有声作品时触发。适用于旁白、人物对白、角色 ID 已标注清楚的文本内容。Skill 会读取可编辑音色库，分析角色数量与性格特征，匹配最接近的音色，逐段调用 SenseAudio TTS，最后拼接为完整音频并以 `MEDIA...

free-feishu-voice

Send customized voice messages to Feishu chats by generating and uploading TTS audio using configurable credentials and options.

Zoho Inventory API integration with managed OAuth. Manage items, sales orders, invoices, purchase orders, bills, contacts, and shipments. Use this skill when...

Free Generator Editor

Skip the learning curve of professional editing software. Describe what you want — generate a video from my images and edit it with transitions and music — a...

Set up, troubleshoot, and optimize HomePod and HomeKit audio workflows with reliable Siri control and room-aware playback tuning.

Skip the learning curve of professional editing software. Describe what you want — turn my clips into a finished video with music and transitions — and get f...

Transcribe audio files (voice notes, recordings, podcasts) to text via the Speechmatics batch transcription API. Use when the user asks to transcribe audio,...

GST + UPI Reconciliation Copilot (India)

Reconcile Indian GST invoice data with UPI transaction statements and produce audit-ready matched/unmatched reports. Use when the user asks to reconcile GST...

Monitor F5-TTS distributed training on the 9-GPU mining rig (Local-LLM) without interfering with the process.

send-ai-voice-message-via-sms

Generates a personalized message, converts to natural speech, and sends as an SMS notification with audio link. Use when sending voice notifications.

Substack Ghostwriting

Write, optimize, and grow Substack content — both newsletter issues (email-first) and web posts (web-first articles/essays). Covers ghostwriting with voice m...

OpenClaw Tailnet TTS Endpoint

Configure an OpenClaw instance to use a local OpenAI-compatible TTS backend (for example openedai-speech) with cloned voices. Use when users ask to wire loca...

Transcribe recorded audio files to text via Doubao Seed-ASR 2.0 (豆包录音文件识别模型2.0) from ByteDance/Volcengine. Best-in-class Chinese speech recognition with spea...

OCR with python

Extract Chinese and English text from images and scanned PDFs, including documents like invoices and contracts, using PaddleOCR in Python.

Transcrição e respostas em áudio em PTBR, Português Brasil - Brazillian portuguese transcription and audio answers

Brazilian Portuguese voice auto-reply skill for OpenClaw. Transcribes audio locally with wav2vec2, generates a reply with the local OpenClaw agent by default...

← PrevPage 5 / 6 (241 skills)Next →