Browse AI Agent Skills | BytesAgain

🎁 Get the FREE AI Skills Starter Guide — Subscribe →

All Skills — audio

51 skills in "audio" matching "transcription"

🦀 ClawHub42.3k dl

Markdown Converter

Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.

🦀 ClawHub12.2k dl

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

🦀 ClawHub3.5k dl

Transcribee 🐝

Transcribe YouTube videos and local audio/video files with speaker diarization. Use when user asks to transcribe a YouTube URL, podcast, video, or audio file. Outputs clean speaker-labeled transcripts ready for LLM analysis.

🦀 ClawHub3.0k dl

it will help you to send voice messages to your AI Assistant and also can make it talk

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

🦀 ClawHub2.6k dl

MarkItDown Skill

OpenClaw agent skill for converting documents to Markdown. Documentation and utilities for Microsoft's MarkItDown library. Supports PDF, Word, PowerPoint, Excel, images (OCR), audio (transcription), HTML, YouTube.

🦀 ClawHub2.3k dl

Pollinations.ai API for AI generation and analysis - text, images, videos, audio, vision, and transcription. Use when user requests AI-powered content (text...

🦀 ClawHub2.0k dl

Content Repurposer

Transform long-form content into platform-optimized snippets. Your agent takes one blog post, video transcript, or podcast notes and generates ready-to-publish Twitter threads, LinkedIn posts, email newsletters, and Instagram captions. Maintains voice consistency while adapting to each platform's format, length, and engagement patterns. Configure tone preferences, platform priorities, and output formats. Use when publishing content across multiple channels, repurposing existing material, or maxi

🦀 ClawHub1.8k dl

Free local speech-to-text transcription using OpenAI Whisper. Transcribe audio files (mp3, wav, m4a, ogg, etc.) to text without API costs. Use when: (1) User...

🦀 ClawHub1.5k dl

Instagram Reels

Download Instagram Reels, transcribe audio, and extract captions. Share a reel URL and get back a full transcript with the original description.

🦀 ClawHub1.1k dl

Speech to Text Transcription

Transcribe audio and video files to text with speaker detection, timestamps, and format conversion.

🦀 ClawHub917 dl

Faster Whisper Gpu

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...

🦀 ClawHub828 dl

Douyin Video Transcribe

Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi...

🦀 ClawHub728 dl

Timeless.day Meeting Notes

Query and manage Timeless meetings, rooms, transcripts, and AI documents. Capture podcast episodes and YouTube videos into Timeless for transcription. Use wh...

🦀 ClawHub475 dl

Podcast Transcript Mining Authority Positioning

Extract guest appearances, speaking topics, and soundbites from podcast transcripts to build authority portfolios and generate podcast pitch templates. Use w...

🦀 ClawHub298 dl

Lovefromio Jarvis Voice

Metallic AI voice persona with TTS and visual transcript styling. Speak responses aloud with a JARVIS-like robotic voice and display transcripts in purple it...

🦀 ClawHub58 dl

Pre Recorded Transcription

Transcribe pre-recorded audio files or URLs with Gladia. Use when the user needs batch/async transcription, speaker diarization, subtitles (SRT/VTT), PII red...

Voice Transcriber

Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3. Transcribes audio messages, saves both audio files and text transcript...

🦀 ClawHub25.6k dl

Openai Whisper Api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

🦀 ClawHub12.7k dl

Video Transcript Downloader

Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.

🦀 ClawHub9.0k dl

Video Subtitles

Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captions, transcripts, or hardcoded subtitles for WhatsApp/social media.

🦀 ClawHub7.4k dl

Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...

🦀 ClawHub3.5k dl

AssemblyAI advanced speech transcription

Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi...

🦀 ClawHub3.2k dl

Speech is Cheap Transcribe

Fast, affordable automatic speech-to-text transcription supporting 100 languages, speaker diarization, word timestamps, and customizable output formats.

🦀 ClawHub2.8k dl

Pure Aliyun ASR skill for voice message transcription, supports multiple channels including Feishu

🦀 ClawHub2.8k dl

Zoom Meeting Assistance Rtms Unofficial Community

Zoom RTMS Meeting Assistant — start on-demand to capture meeting audio, video, transcript, screenshare, and chat via Zoom Real-Time Media Streams. Handles meeting.rtms_started and meeting.rtms_stopped webhook events. Provides AI-powered dialog suggestions, sentiment analysis, and live summaries with WhatsApp notifications. Use when a Zoom RTMS webhook fires or the user asks to record/analyze a meeting.

🦀 ClawHub2.3k dl

Process, enhance, and convert audio files with noise removal, normalization, format conversion, transcription, and podcast workflows.

🦀 ClawHub2.3k dl

Play Music from YouTube

Play music on YouTube via browser automation with playwright-cli. Use when the user wants to: (1) play a specific song (e.g. 'play Money Money Money by ABBA') (2) play songs by an artist as a playlist or mix (e.g. 'play Jay Chou's songs') (3) play genre or mood-based music (e.g. 'play relaxing spa music', 'play 60s Chinese oldies') (4) control playback — next, pause, resume, stop, skip ad, change song, close the player. Also handles song/artist name corrections from voice transcription erro

🦀 ClawHub2.3k dl

Pocket AI Transcripts

Read transcripts and summaries from Pocket AI (heypocket.com) recording devices. Use when users want to retrieve, search, or analyze their Pocket recordings, transcripts, summaries, or action items. Triggers on requests involving Pocket device data, conversation transcripts, meeting recordings, or audio note retrieval.

🦀 ClawHub2.2k dl

Plaud Unofficial Skill

Use when accessing Plaud voice recorder data (recordings, transcripts, AI summaries) - guides credential setup and provides patterns for plaud_client.py

🦀 ClawHub2.0k dl

Local Voice (FluidAudio TTS/STT)

Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.

🦀 ClawHub1.9k dl

Meta Video Ad Analyzer

Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.

🦀 ClawHub1.6k dl

Podcast Chaptering Highlights

Create chapters, highlights, and show notes from podcast audio or transcripts. Use when a user wants chapter markers, highlight clips, or show-note drafts without publishing or distribution actions.

🦀 ClawHub1.6k dl

Complete Venice AI API toolkit - image generation, video, audio, embeddings, transcription, characters, models, and admin functions. Privacy-focused inferenc...

🦀 ClawHub1.4k dl

acestep-lyrics-transcription

Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.

🦀 ClawHub1.3k dl

Analyze Twitter Spaces and voice conversations to extract market intelligence, crypto alpha, sentiment analysis, and speaker-attributed insights. Transforms spoken audio into structured reports, full transcripts, and machine-readable metadata. Use when you need intelligence from Twitter Spaces, podcast discussions, or any long-form voice content — especially for crypto markets, AI trends, and expert commentary that only exists in audio.

🦀 ClawHub1.1k dl

Transcribe audio to text and generate spoken AI responses using Whisper and ElevenLabs via CLI with transcript storage and search.

🦀 ClawHub1.0k dl

Download videos, extract transcripts, capture frames. Analyze YouTube, tutorials, DD videos with yt-dlp + Whisper + ffmpeg.

🦀 ClawHub849 dl

Super-Transcribe — Unified Speech-to-Text

Unified speech-to-text skill. Use when the user asks to transcribe audio or video, generate subtitles, identify speakers, translate speech, search transcript...

🦀 ClawHub844 dl

Video Transcribe

Use when the user wants to transcribe, caption, or get the text content of a video or audio file — e.g. "transcribe this video", "get the transcript", "what...

🦀 ClawHub804 dl

Transcribe audio and video files using OpenAI Whisper API. Use when user wants to transcribe audio/video files, extract speech from media, or get text from r...

🦀 ClawHub802 dl

MH openai-whisper-api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

🦀 ClawHub649 dl

Podcast Transcribe

For transcript or subtitle requests involving podcast URLs, public audio URLs/files, or raw transcript cleanup. Generates audio + SRT + TXT artifacts and can...

🦀 ClawHub490 dl

ifly-speed-transcription

Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C...

🦀 ClawHub419 dl

Local Transcription

Local speech-to-text transcription with Qwen ASR — transcription routed across your Apple Silicon fleet. Transcribe meetings, voice notes, podcasts with loca...

🦀 ClawHub302 dl

PodSips Podcast Search

Search podcast transcripts and retrieve episode data via the PodSips API. Use when user asks to "search podcasts", "find podcast clips", "get transcript", "l...

🦀 ClawHub283 dl

Voice Transcriber Toolkit

Voice-to-Text Transcription Toolkit - 语音识别转文字，支持Whisper/Vosk引擎，批量处理，字幕导出 | Speech recognition & transcription with Whisper/Vosk engines, batch processing, su...

🦀 ClawHub59 dl

Live Transcription

Real-time speech-to-text streaming with Gladia via WebSocket. Use when the user needs live transcription, builds a voice agent, meeting recorder, call center...

format37/youtube_mcp

🐍 ☁️ – MCP server that transcribes YouTube videos to text. Uses yt-dlp to download audio and OpenAI's Whisper-1 for more precise transcription than youtube captions. Provide a YouTube URL and get back the full transcript splitted by chunks for long videos.