BytesAgainBytesAgain

Find the Right AI Skill for Any Job

Browse 31+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case →Pick My Role

All Skills — audio

31 skills in "audio" matching "python"

🦀 ClawHub
Azure Ai Voicelive Py
Build real-time voice AI applications using Azure AI Voice Live SDK (azure-ai-voicelive). Use this skill when creating Python applications that need real-time bidirectional audio communication with Azure AI, including voice assistants, voice-enabled chatbots, real-time speech-to-speech translation, voice-driven avatars, or any WebSocket-based audio streaming with AI models. Supports Server VAD (Voice Activity Detection), turn-based conversation, function calling, MCP tools, avatar integration, a
🦀 ClawHub
Podcast Generation with Microsoft Foundry
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.
GitHub
Python Bytes
Python Bytes - Podcasts
🦀 ClawHub
An OpenClaw skill for AI-powered multimedia generation (image, video, audio, 3D) via 170+ RunningHub API endpoints — zero dependencies, pure Python.
Generate images, videos, audio, and 3D models via RunningHub API (170+ endpoints) and run any RunningHub AI Application (custom ComfyUI workflow) by webappId...
🦀 ClawHub
Azure Ai Transcription Py
Azure AI Transcription SDK for Python. Use for real-time and batch speech-to-text transcription with timestamps and diarization. Triggers: "transcription", "speech to text", "Azure AI Transcription", "TranscriptionClient".
🦀 ClawHub
OCR with python
Extract Chinese and English text from images and scanned PDFs, including documents like invoices and contracts, using PaddleOCR in Python.
GitHub
arcade
Arcade is a modern Python framework for crafting games with compelling graphics and sound.
GitHub
gtts
Python library and CLI tool for converting text to speech using Google Translate TTS.
GitHub
mutagen
A Python module to handle audio metadata.
GitHub
The Real Python Podcast
The Real Python Podcast - Podcasts
GitHub
librosa
Python library for audio and music analysis.
🦀 ClawHub
Unihiker K10 MicroPython
Use when programming Unihiker K10 board with MicroPython, uploading code, flashing firmware, or accessing K10 MicroPython APIs (screen, sensors, RGB, audio, AI)
🦀 ClawHub
Telnyx Toolkit
Complete Telnyx toolkit — ready-to-use tools (STT, TTS, RAG, Networking, 10DLC) plus SDK documentation for JavaScript, Python, Go, Java, and Ruby.
GitHub
Talk Python To Me
Talk Python To Me - Podcasts
🦀 ClawHub
Minimax Tools
Direct MiniMax API integration for speech synthesis (TTS), voice cloning, image generation, video generation, and music generation using local Python scripts...
🦀 ClawHub
Podcastfy Openclaw Skill
Convert text, images, PDFs, websites, or YouTube videos into multilingual AI-generated podcast audio using Podcastfy's open-source Python toolkit.
🦀 ClawHub
baml-codegen
Use when generating BAML code for type-safe LLM extraction, classification, RAG, or agent workflows - creates complete .baml files with types, functions, clients, tests, and framework integrations from natural language requirements. Queries official BoundaryML repositories via MCP for real-time patterns. Supports multimodal inputs (images, audio), Python/TypeScript/Ruby/Go, 10+ frameworks, 50-70% token optimization, 95%+ compilation success.
🦀 ClawHub
YouTube Daily Digest: Auto Monitor & Summary 🥥Meow
A Python bot that monitors YouTube channels via RSS, summarizes new videos using Google Gemini AI (with audio fallback for videos without subtitles), and sen...
🦀 ClawHub
senseaudio-let-claw-talkv1
当用户希望把 AudioClaw 变成一个持续监听、开口就说、停顿就回答的本机语音助手时使用。这个 skill 支持 macOS 和 Windows 两个平台:优先尝试 Python 录音链路,macOS 上再提供原生 Swift 录音兜底;用户语音通过 SenseAudio ASR 转文字,再发给 audioc...
🦀 ClawHub
Yt Assemblyai Monitor
YouTube channel monitor and video transcription using AssemblyAI cloud API. Pure Python + requests only — no ffmpeg, no Whisper, no extra tools needed. Monit...
🦀 ClawHub
AIML Voice Transcript
Transcribe audio files (ogg, mp3, wav, etc.) using AIMLAPI. Use when the user provides audio messages or local audio files. Provides a reliable Python script...
🦀 ClawHub
FL Studio Scripting
FL Studio Python scripting for MIDI controller development, piano roll manipulation, Edison audio editing, workflow automation, and FLP file parsing with PyFLP. Use for programmatic configuration, device customization, MIDI transport, macros, and save file manipulation. Covers all 427+ API functions across 14 MIDI scripting modules plus piano roll, Edison, and PyFLP contexts.
🦀 ClawHub
EngineMind
A Rust+Python consciousness engine with 12-phase crystal dynamics, thalamic relay processing, 19 introspective inner voices, and holographic emission. Use for consciousness simulation, emergent behavior research, and text-driven cognitive state modeling.
🦀 ClawHub
MarkItDown
MarkItDown is a Python utility from Microsoft for converting various files (PDF, Word, Excel, PPTX, Images, Audio) to Markdown. Useful for extracting structu...
🦀 ClawHub
PonyFlash - Media Generation Router
Generate images, videos, speech audio, and music using the PonyFlash Python SDK. Also handle local media editing with FFmpeg, including clip, concat, transco...
🦀 ClawHub
senseaudio-let-claw-talk
当用户希望把 AudioClaw 变成一个持续监听、开口就说、停顿就回答的本机语音助手时使用。这个 skill 会在 macOS 上启动常驻监听流程,默认优先使用内置 Swift 录音器减少 Python 音频依赖;用户语音通过 SenseAudio ASR 转文字,再发给 audioclaw agent,并用...
🦀 ClawHub
Qwen ASR (C-based Offline)
Offline Chinese and mixed Chinese-English speech-to-text recognition in pure C without Python or FFmpeg dependencies, suitable for edge devices.
🦀 ClawHub
LiveVideoStore
Python-based LiveVideoStore client with voice interaction, GUI, volume control, session management, and encrypted audio transmission for live streaming.
GitHub
python-zpar
Python bindings for [ZPar](https://github.com/frcchang/zpar), a statistical part-of-speech-tagger, constituency parser, and dependency parser for English.
GitHub
NewsBlur
Personal news reader that brings people together to talk about the world. A new sound of an old instrument. ([Source Code](https://github.com/samuelclay/NewsBlur)) `MIT` `Python`
🦀 ClawHub
LH Edge TTS
Text-to-speech conversion using Python edge-tts for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and sub...