BytesAgainBytesAgain

Find the Right AI Skill for Any Job

Browse 2,510+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case →Pick My Role

All Skills — audio

2,510 skills in "audio"

🦀 ClawHub
Text To Speech
Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: DIA TTS (conversational), Kokoro TTS, Chatterbox, Hig...
🦀 ClawHub
senseaudio-conversation-rehearsal
Use when a user wants to rehearse a high-pressure conversation such as a performance review, reporting meeting, promotion defense, difficult manager conversa...
🦀 ClawHub
Telnyx Tts
Generate speech audio from text using Telnyx Text-to-Speech API. Use when you need to convert text to spoken audio, create voice messages, or generate audio content.
🦀 ClawHub
Video Editor
Edits existing videos using ffmpeg and Python. Use ALWAYS when the user wants to edit a video, cut a video, join videos, add subtitles, add music, remove aud...
🦀 ClawHub
Stripe
Query Stripe customer and billing data from a synced PostgreSQL database. Use when the user asks about Stripe customers, subscriptions, invoices, charges, or any Stripe-related data.
GitHub
🎙️ OpenSource Voice Dictation Agent (like Wispr Flow
🎙️ OpenSource Voice Dictation Agent (like Wispr Flow - 🗣️ Voice AI Agents
🦀 ClawHub
LiveVideoStore
Python-based LiveVideoStore client with voice interaction, GUI, volume control, session management, and encrypted audio transmission for live streaming.
GitHub
Eleven Labs
AI voice generator.
GitHub
Resemble AI
AI voice generator and voice cloning for text to speech.
GitHub
WellSaid
Convert text to voice in real time.
GitHub
Play.ht
AI Voice Generator. Generate realistic Text to Speech voice over online with AI. Convert text to audio.
GitHub
podcast.ai
A podcast that is entirely generated by artificial intelligence, powered by Play.ht text-to-voice AI.
GitHub
VALL-E X
A cross-lingual neural codec language model for cross-lingual speech synthesis.
GitHub
TorToiSe
A multi-voice text-to-speech system trained with an emphasis on quality. #opensource
GitHub
Bark
A transformer-based text-to-audio model. #opensource
GitHub
Wispr Flow
Flow makes writing quick with seamless voice dictation for any application on your computer.
GitHub
Vibe Transcribe
All-in-one solution for effortless audio and video transcription. [#opensource](https://github.com/thewh1teagle/vibe)
GitHub
whisper.cpp
Port of OpenAI's Whisper model in C/C++. #opensource
GitHub
Voice Over Generator
Writes scripts and makes instant voice overs by [Mike Russell](https://x.com/imikerussell)
GitHub
PodGPT
Summarize and ask questions about any podcast episode by [Mikkel Svartveit](https://x.com/mikkelsvartveit)
GitHub
Audiophile Assistant
Specializes in providing expert advice on high-fidelity audio, from equipment selection to sound quality analysis by [@HeyitsRadinn](https://github.com/HeyitsRadinn)
GitHub
Inspirer
A bot that writes inspirational speeches
GitHub
Music Bot
Lyric writing, genre identification, and beat suggestions
GitHub
PlaylistAI: Spotify
Create Spotify music playlists for any prompt by [Brett Bauman](https://x.com/brettunhandled/)
🦀 ClawHub
BookMorph Magic
Orchestrate book-to-content workflows to generate video, audio, cover images, and a manifest for episode or campaign packages.
🦀 ClawHub
Vision Recognition Ocr
Vehicle/animal/plant recognition plus OCR for screenshots, photos, invoices, and tables. Use when users ask 识别车型/看图识别/提取文字/OCR. Supports local path, URL, and...
GitHub
@levelsio
Talk with @levelsio on ChatGPT. Ask any question you want about building your own startup, digital nomading, remote work and whatever else you'd like to ask. Trained on all of my podcasts, interviews, blog posts and tweets! by [levelsio](https://twitter.com/levelsio)
GitHub
Stanford POS Tagger
A Part-Of-Speech Tagger (POS Tagger).
GitHub
CMU Sphinx
Open Source Toolkit For Speech Recognition purely based on Java speech recognition library.
GitHub
wav2letter
a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research.
GitHub
python-zpar
Python bindings for [ZPar](https://github.com/frcchang/zpar), a statistical part-of-speech-tagger, constituency parser, and dependency parser for English.
GitHub
Rasa
A "machine learning framework to automate text-and voice-based conversations."
🦀 ClawHub
SenseAudio-ASR
Build and troubleshoot SenseAudio speech recognition integrations, including HTTP transcription (`/v1/audio/transcriptions`), realtime WebSocket ASR (`/ws/v1...
🦀 ClawHub
Strider Spotify
Control Spotify playback via Strider Labs MCP connector. Search music, manage playlists, control playback, and discover new artists.
🦀 ClawHub
Love Reply Skill
Love Reply is an AI romantic reply assistant for crush texts, flirty banter, and moments when you want to sound warm, playful, and genuinely attractive. It h...
🦀 ClawHub
MoodMusic Conversation-Based Music Recommendations
Recommend music based on your current mood, activity, or conversation context. Returns a curated track list you can search on Spotify, YouTube, or Apple Music.
🦀 ClawHub
MAI Transcribe
Transcribe audio with Microsoft's MAI-Transcribe-1 model via Azure AI Speech.
🦀 ClawHub
抖音热榜 / Douyin Hot
抖音热榜获取技能 | Douyin Hot List Fetcher 获取抖音热榜/热搜榜数据 | Get Douyin hot list/trending data 包含热门视频、挑战赛、音乐等多领域热门内容 | Includes popular videos, challenges, music and mo...
🦀 ClawHub
Smart Audio Analyzer
All-in-one audio analysis: transcribe, identify speakers by voiceprint, auto-detect scene (meeting/interview/training/talk), generate structured notes. The O...
🦀 ClawHub
MAI Voice
Synthesize speech with Microsoft's MAI-Voice-1 voices via Azure AI Speech REST API.
🦀 ClawHub
Voice Picker
Recommend the best SenseAudio voice for any scenario or emotion. Use when users ask which voice to use — e.g. "儿童故事播客用什么音色", "电商直播带货适合哪个声音", "我需要撒娇感的女声", "有没...
🦀 ClawHub
Spotify Skill
Control Spotify playback, search music, manage playlists, generate discovery playlists, and analyze listening habits via the Spotify Web API. Use when asked...
🦀 ClawHub
Kid Point Voice Component
SenseAudio Voice - 语音合成 (TTS) + 语音识别 (ASR),支持语言自动切换
🦀 ClawHub
SenseAudio Voice CN
SenseAudio Voice - 语音合成 (TTS) + 语音识别 (ASR),支持语言自动切换
🦀 ClawHub
Text to Voice Local
Local text-to-voice generation for OpenClaw workspaces using a canonical txt-to-mp3 pipeline. Use when the user wants to turn any prepared text into voice, a...
🦀 ClawHub
Ai Video Pipeline
对话式AI短视频创作工具。用户提出想法 → agent 设计脚本 → 人工确认 → 自动制作MP4。 当用户提到:(1) 做个视频/短视频, (2) AI旁白视频, (3) 认知自述/播客风格视频, (4) 文稿转视频。 不要在用户仅提到"视频"、"TTS"、"语音"等模糊词时激活(可能是其他需求)。
🦀 ClawHub
Invoice Extractor
Extract structured data from invoices and receipts (PDFs and images). Output JSON, CSV, or build a running expense ledger. Use when someone shares an invoice...
GitHub
mp4ff
Library and tools for working with MP4 files containing video, audio, subtitles, or metadata.
← PrevPage 49 / 53 (2,510 skills)Next →