Find the Right AI Skill for Any Job

Browse 2,510+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

All Skills — audio

2,510 skills in "audio"

Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: DIA TTS (conversational), Kokoro TTS, Chatterbox, Hig...

🦀 ClawHub

senseaudio-conversation-rehearsal

Use when a user wants to rehearse a high-pressure conversation such as a performance review, reporting meeting, promotion defense, difficult manager conversa...

🦀 ClawHub

Telnyx Tts

Generate speech audio from text using Telnyx Text-to-Speech API. Use when you need to convert text to spoken audio, create voice messages, or generate audio content.

🦀 ClawHub

Video Editor

Edits existing videos using ffmpeg and Python. Use ALWAYS when the user wants to edit a video, cut a video, join videos, add subtitles, add music, remove aud...

🦀 ClawHub

Stripe

Query Stripe customer and billing data from a synced PostgreSQL database. Use when the user asks about Stripe customers, subscriptions, invoices, charges, or any Stripe-related data.

⭐ GitHub

🎙️ OpenSource Voice Dictation Agent (like Wispr Flow

🎙️ OpenSource Voice Dictation Agent (like Wispr Flow - 🗣️ Voice AI Agents

🦀 ClawHub

LiveVideoStore

Python-based LiveVideoStore client with voice interaction, GUI, volume control, session management, and encrypted audio transmission for live streaming.

AI voice generator and voice cloning for text to speech.

⭐ GitHub

WellSaid

Convert text to voice in real time.

⭐ GitHub

Play.ht

AI Voice Generator. Generate realistic Text to Speech voice over online with AI. Convert text to audio.

⭐ GitHub

podcast.ai

A podcast that is entirely generated by artificial intelligence, powered by Play.ht text-to-voice AI.

⭐ GitHub

VALL-E X

A cross-lingual neural codec language model for cross-lingual speech synthesis.

⭐ GitHub

TorToiSe

A multi-voice text-to-speech system trained with an emphasis on quality. #opensource

⭐ GitHub

Bark

A transformer-based text-to-audio model. #opensource

⭐ GitHub

Wispr Flow

Flow makes writing quick with seamless voice dictation for any application on your computer.

⭐ GitHub

Vibe Transcribe

All-in-one solution for effortless audio and video transcription. [#opensource](https://github.com/thewh1teagle/vibe)

⭐ GitHub

whisper.cpp

Port of OpenAI's Whisper model in C/C++. #opensource

⭐ GitHub

Voice Over Generator

Writes scripts and makes instant voice overs by [Mike Russell](https://x.com/imikerussell)

⭐ GitHub

PodGPT

Summarize and ask questions about any podcast episode by [Mikkel Svartveit](https://x.com/mikkelsvartveit)

⭐ GitHub

Audiophile Assistant

Specializes in providing expert advice on high-fidelity audio, from equipment selection to sound quality analysis by [@HeyitsRadinn](https://github.com/HeyitsRadinn)

⭐ GitHub

Inspirer

A bot that writes inspirational speeches

⭐ GitHub

Music Bot

Lyric writing, genre identification, and beat suggestions

⭐ GitHub

PlaylistAI: Spotify

Create Spotify music playlists for any prompt by [Brett Bauman](https://x.com/brettunhandled/)

🦀 ClawHub

BookMorph Magic

Orchestrate book-to-content workflows to generate video, audio, cover images, and a manifest for episode or campaign packages.

🦀 ClawHub

Vision Recognition Ocr

Vehicle/animal/plant recognition plus OCR for screenshots, photos, invoices, and tables. Use when users ask 识别车型/看图识别/提取文字/OCR. Supports local path, URL, and...

⭐ GitHub

@levelsio

Talk with @levelsio on ChatGPT. Ask any question you want about building your own startup, digital nomading, remote work and whatever else you'd like to ask. Trained on all of my podcasts, interviews, blog posts and tweets! by [levelsio](https://twitter.com/levelsio)

⭐ GitHub

Stanford POS Tagger

A Part-Of-Speech Tagger (POS Tagger).

⭐ GitHub

CMU Sphinx

Open Source Toolkit For Speech Recognition purely based on Java speech recognition library.

⭐ GitHub

wav2letter

a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research.

⭐ GitHub

python-zpar

Python bindings for [ZPar](https://github.com/frcchang/zpar), a statistical part-of-speech-tagger, constituency parser, and dependency parser for English.

⭐ GitHub

Rasa

A "machine learning framework to automate text-and voice-based conversations."

🦀 ClawHub

SenseAudio-ASR

Build and troubleshoot SenseAudio speech recognition integrations, including HTTP transcription (`/v1/audio/transcriptions`), realtime WebSocket ASR (`/ws/v1...

🦀 ClawHub

Strider Spotify

Control Spotify playback via Strider Labs MCP connector. Search music, manage playlists, control playback, and discover new artists.

🦀 ClawHub

Love Reply Skill

Love Reply is an AI romantic reply assistant for crush texts, flirty banter, and moments when you want to sound warm, playful, and genuinely attractive. It h...

🦀 ClawHub

MoodMusic Conversation-Based Music Recommendations

Recommend music based on your current mood, activity, or conversation context. Returns a curated track list you can search on Spotify, YouTube, or Apple Music.

🦀 ClawHub

MAI Transcribe

Transcribe audio with Microsoft's MAI-Transcribe-1 model via Azure AI Speech.

🦀 ClawHub

抖音热榜 / Douyin Hot

抖音热榜获取技能 | Douyin Hot List Fetcher 获取抖音热榜/热搜榜数据 | Get Douyin hot list/trending data 包含热门视频、挑战赛、音乐等多领域热门内容 | Includes popular videos, challenges, music and mo...

🦀 ClawHub

Smart Audio Analyzer

All-in-one audio analysis: transcribe, identify speakers by voiceprint, auto-detect scene (meeting/interview/training/talk), generate structured notes. The O...

🦀 ClawHub

MAI Voice

Synthesize speech with Microsoft's MAI-Voice-1 voices via Azure AI Speech REST API.

🦀 ClawHub

Voice Picker

Recommend the best SenseAudio voice for any scenario or emotion. Use when users ask which voice to use — e.g. "儿童故事播客用什么音色", "电商直播带货适合哪个声音", "我需要撒娇感的女声", "有没...

🦀 ClawHub

Spotify Skill

Control Spotify playback, search music, manage playlists, generate discovery playlists, and analyze listening habits via the Spotify Web API. Use when asked...

🦀 ClawHub

Kid Point Voice Component

SenseAudio Voice - 语音合成 (TTS) + 语音识别 (ASR)，支持语言自动切换

🦀 ClawHub

SenseAudio Voice CN

SenseAudio Voice - 语音合成 (TTS) + 语音识别 (ASR)，支持语言自动切换

🦀 ClawHub

Text to Voice Local

Local text-to-voice generation for OpenClaw workspaces using a canonical txt-to-mp3 pipeline. Use when the user wants to turn any prepared text into voice, a...

🦀 ClawHub

Ai Video Pipeline

对话式AI短视频创作工具。用户提出想法 → agent 设计脚本 → 人工确认 → 自动制作MP4。当用户提到：(1) 做个视频/短视频, (2) AI旁白视频, (3) 认知自述/播客风格视频, (4) 文稿转视频。不要在用户仅提到"视频"、"TTS"、"语音"等模糊词时激活（可能是其他需求）。

🦀 ClawHub

Invoice Extractor

Extract structured data from invoices and receipts (PDFs and images). Output JSON, CSV, or build a running expense ledger. Use when someone shares an invoice...

⭐ GitHub

mp4ff

Library and tools for working with MP4 files containing video, audio, subtitles, or metadata.

← PrevPage 49 / 53 (2,510 skills)Next →