Find the Right AI Skill for Any Job

Browse 31+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case →Pick My Role

All Skills — audio

31 skills in "audio" matching "python"

Azure Ai Voicelive Py

Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, a

Podcast Generation with Microsoft Foundry

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.

Python Bytes - Podcasts

An OpenClaw skill for AI-powered multimedia generation (image, video, audio, 3D) via 170+ RunningHub API endpoints — zero dependencies, pure Python.

Generate images, videos, audio, and 3D models via RunningHub API (170+ endpoints) and run any RunningHub AI Application (custom ComfyUI workflow) by webappId...

Azure Ai Transcription Py

Azure AI Transcription SDK for Python. Use for real-time and batch speech-to-text transcription with timestamps and diarization. Triggers: "transcription", "speech to text", "Azure AI Transcription", "TranscriptionClient".

OCR with python

Extract Chinese and English text from images and scanned PDFs, including documents like invoices and contracts, using PaddleOCR in Python.

Arcade is a modern Python framework for crafting games with compelling graphics and sound.

Python library and CLI tool for converting text to speech using Google Translate TTS.

A Python module to handle audio metadata.

The Real Python Podcast

The Real Python Podcast - Podcasts

Python library for audio and music analysis.

Unihiker K10 MicroPython

Use when programming Unihiker K10 board with MicroPython, uploading code, flashing firmware, or accessing K10 MicroPython APIs (screen, sensors, RGB, audio, AI)

Complete Telnyx toolkit — ready-to-use tools (STT, TTS, RAG, Networking, 10DLC) plus SDK documentation for JavaScript, Python, Go, Java, and Ruby.

Talk Python To Me

Talk Python To Me - Podcasts

Direct MiniMax API integration for speech synthesis (TTS), voice cloning, image generation, video generation, and music generation using local Python scripts...

Podcastfy Openclaw Skill

Convert text, images, PDFs, websites, or YouTube videos into multilingual AI-generated podcast audio using Podcastfy's open-source Python toolkit.

Use when generating BAML code for type-safe LLM extraction, classification, RAG, or agent workflows - creates complete .baml files with types, functions, clients, tests, and framework integrations from natural language requirements. Queries official BoundaryML repositories via MCP for real-time patterns. Supports multimodal inputs (images, audio), Python/TypeScript/Ruby/Go, 10+ frameworks, 50-70% token optimization, 95%+ compilation success.

YouTube Daily Digest: Auto Monitor & Summary 🥥Meow

A Python bot that monitors YouTube channels via RSS, summarizes new videos using Google Gemini AI (with audio fallback for videos without subtitles), and sen...

senseaudio-let-claw-talkv1

当用户希望把 AudioClaw 变成一个持续监听、开口就说、停顿就回答的本机语音助手时使用。这个 skill 支持 macOS 和 Windows 两个平台：优先尝试 Python 录音链路，macOS 上再提供原生 Swift 录音兜底；用户语音通过 SenseAudio ASR 转文字，再发给 audioc...

Yt Assemblyai Monitor

YouTube channel monitor and video transcription using AssemblyAI cloud API. Pure Python + requests only — no ffmpeg, no Whisper, no extra tools needed. Monit...

AIML Voice Transcript

Transcribe audio files (ogg, mp3, wav, etc.) using AIMLAPI. Use when the user provides audio messages or local audio files. Provides a reliable Python script...

FL Studio Scripting

FL Studio Python scripting for MIDI controller development, piano roll manipulation, Edison audio editing, workflow automation, and FLP file parsing with PyFLP. Use for programmatic configuration, device customization, MIDI transport, macros, and save file manipulation. Covers all 427+ API functions across 14 MIDI scripting modules plus piano roll, Edison, and PyFLP contexts.

A Rust+Python consciousness engine with 12-phase crystal dynamics, thalamic relay processing, 19 introspective inner voices, and holographic emission. Use for consciousness simulation, emergent behavior research, and text-driven cognitive state modeling.

MarkItDown is a Python utility from Microsoft for converting various files (PDF, Word, Excel, PPTX, Images, Audio) to Markdown. Useful for extracting structu...

PonyFlash - Media Generation Router

Generate images, videos, speech audio, and music using the PonyFlash Python SDK. Also handle local media editing with FFmpeg, including clip, concat, transco...

senseaudio-let-claw-talk

当用户希望把 AudioClaw 变成一个持续监听、开口就说、停顿就回答的本机语音助手时使用。这个 skill 会在 macOS 上启动常驻监听流程，默认优先使用内置 Swift 录音器减少 Python 音频依赖；用户语音通过 SenseAudio ASR 转文字，再发给 audioclaw agent，并用...

Qwen ASR (C-based Offline)

Offline Chinese and mixed Chinese-English speech-to-text recognition in pure C without Python or FFmpeg dependencies, suitable for edge devices.

Python-based LiveVideoStore client with voice interaction, GUI, volume control, session management, and encrypted audio transmission for live streaming.

Python bindings for [ZPar](https://github.com/frcchang/zpar), a statistical part-of-speech-tagger, constituency parser, and dependency parser for English.

Personal news reader that brings people together to talk about the world. A new sound of an old instrument. ([Source Code](https://github.com/samuelclay/NewsBlur)) `MIT` `Python`

Text-to-speech conversion using Python edge-tts for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and sub...