Find the Right AI Skill for Any Job

Browse 80+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

All Skills — audio

80 skills in "audio" matching "transcription"

YouTube channel monitor and video transcription using AssemblyAI cloud API. Pure Python + requests only — no ffmpeg, no Whisper, no extra tools needed. Monit...

🦀 ClawHub

Elevenlabs Transcribe

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

🦀 ClawHub

subtitle-extractor

Subtitle extractor for Bilibili, YouTube, Xiaohongshu, Douyin, and local files. Extracts native subtitles or Whisper transcription in original format. Agent...

🦀 ClawHub

Local Whisper

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

🦀 ClawHub

AssemblyAI advanced speech transcription

Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi...

🦀 ClawHub

Youtube Transcriber

One-command YouTube video transcription. Automatically downloads audio and transcribes using OpenAI Whisper API — works even when YouTube subtitles are disab...

🦀 ClawHub

clip-editor

Video clip editing skill for automatically analyzing video content and generating CapCut draft templates. Uses local Whisper for speech transcription, Qwen-V...

🦀 ClawHub

mlx-whisper

Set up mlx-whisper as the local audio transcription engine for OpenClaw on Apple Silicon Macs (M1/M2/M3/M4). Automatically transcribes voice notes sent via T...

🔧 Dify

Fal (Dify)

**FAL** is an advanced suite of tools designed for AI-powered image generation and audio transcription. In **Dify**, FAL provides multiple services, including image creation with models like **FLUX.1 [pro]** and **FLUX 1.1 [pro] ultra**, allowing users to generate high-quality visuals with customizable parameters. Additionally, FAL offers **Wizper**, a transcription tool that converts audio files

🦀 ClawHub

Qwen ASR

Local speech-to-text using Qwen3-ASR (CPU-only, no API key, no cloud). Use when: (1) a voice message or audio file needs transcription, (2) user asks to tran...

🦀 ClawHub

Meta Video Ad Analyzer

Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.

🦀 ClawHub

Play Music from YouTube

Play music on YouTube via browser automation with playwright-cli. Use when the user wants to: (1) play a specific song (e.g. 'play Money Money Money by ABBA') (2) play songs by an artist as a playlist or mix (e.g. 'play Jay Chou's songs') (3) play genre or mood-based music (e.g. 'play relaxing spa music', 'play 60s Chinese oldies') (4) control playback — next, pause, resume, stop, skip ad, change song, close the player. Also handles song/artist name corrections from voice transcription erro

🦀 ClawHub

case.dev

case.dev — a legal AI platform with encrypted document vaults, OCR, audio transcription, and legal search. This skill installs the casedev CLI and provides s...

🔌 MCP

format37/youtube_mcp

🐍 ☁️ – MCP server that transcribes YouTube videos to text. Uses yt-dlp to download audio and OpenAI's Whisper-1 for more precise transcription than youtube captions. Provide a YouTube URL and get back the full transcript splitted by chunks for long videos.

🦀 ClawHub

ifly-speed-transcription

Ultra-fast speech transcription using iFLYTEK Speed Transcription API. Transcribe audio files (WAV/PCM/MP3) up to 5 hours in ~20 seconds per hour. Supports C...

🦀 ClawHub

Aliyun Asr

Pure Aliyun ASR skill for voice message transcription, supports multiple channels including Feishu

🦀 ClawHub

Telegram Whisper Transcribe

Standalone Telegram bot for voice message transcription via OpenAI Whisper API. No LLM overhead — audio goes directly to Whisper and text comes back in 2-5 s...

⭐ GitHub

Vibe Transcribe

All-in-one solution for effortless audio and video transcription. [#opensource](https://github.com/thewh1teagle/vibe)

🦀 ClawHub

Voice To Protocol Transcriber

Record experimental procedures and observations via voice commands during lab work. Real-time transcription for structured experiment documentation.

🦀 ClawHub

transcription

Transcribe audio and video files using OpenAI Whisper API. Use when user wants to transcribe audio/video files, extract speech from media, or get text from r...

🦀 ClawHub

Douyin Video Transcribe

Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi...

🦀 ClawHub

Nex Voice

Voice note transcription and intelligent action item extraction for capture and organization of verbal communication. Record and transcribe voice notes, voic...

🦀 ClawHub

Faster Whisper Transcription

Transcribes local voice messages to text using Faster Whisper models for fast, privacy-focused speech recognition on audio files.

🦀 ClawHub

Faster Whisper Gpu

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...

🦀 ClawHub

Venice API Kit

Complete Venice AI API toolkit - image generation, video, audio, embeddings, transcription, characters, models, and admin functions. Privacy-focused inferenc...

🦀 ClawHub

Youtube Transcript Api

Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch proces...

🦀 ClawHub

Voice Transcriber Pro

Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3. Transcribes audio messages, saves both audio files and text transcript...

🦀 ClawHub

acestep-lyrics-transcription

Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.

🦀 ClawHub

Audio

Process, enhance, and convert audio files with noise removal, normalization, format conversion, transcription, and podcast workflows.

🦀 ClawHub

Speech To Text

Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...

🦀 ClawHub

Ai Video Transcription

Transcribe video speech to text with 98%+ accuracy using AI — convert spoken audio from any video into perfectly timed text transcripts, searchable documents...

🦀 ClawHub

Coze Asr

Automatic Speech Recognition (ASR) using Coze API. Use when you need to transcribe audio files to text. Supports Chinese audio transcription via Coze's speec...

← PrevPage 2 / 2 (80 skills)