Find the Right AI Skill for Any Job

Browse 1+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case →Pick My Role

All Skills — video

1 skills in "video" matching "transcription"

🦀 ClawHub33.5k dl

Markdown Converter

Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.

🦀 ClawHub20.2k dl

Openai Whisper Api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

🦀 ClawHub9.8k dl

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

🦀 ClawHub5.8k dl

Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...

🦀 ClawHub3.1k dl

Use AudioPod AI's API for audio processing tasks including AI music generation (text-to-music, text-to-rap, instrumentals, samples, vocals), stem separation, text-to-speech, noise reduction, speech-to-text transcription, speaker separation, and media extraction. Use when the user needs to generate music/songs/rap from text, split a song into stems/vocals/instruments, generate speech from text, clean up noisy audio, transcribe audio/video, or extract audio from YouTube/URLs. Requires AUDIOPOD_API

🦀 ClawHub2.9k dl

AssemblyAI advanced speech transcription

Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi...

🦀 ClawHub2.6k dl

Speech is Cheap Transcribe

Fast, affordable automatic speech-to-text transcription supporting 100 languages, speaker diarization, word timestamps, and customizable output formats.

🦀 ClawHub2.6k dl

it will help you to send voice messages to your AI Assistant and also can make it talk

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

🦀 ClawHub2.4k dl

Elevenlabs Transcribe

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

🦀 ClawHub2.4k dl

Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...

🦀 ClawHub2.2k dl

DeepGram Speech platform

Command-line tool for fast, accurate speech-to-text transcription from local files, URLs, or live audio using Deepgram’s API with customizable options.

🦀 ClawHub2.0k dl

Pure Aliyun ASR skill for voice message transcription, supports multiple channels including Feishu

🦀 ClawHub2.0k dl

Azure Ai Transcription Py

Azure AI Transcription SDK for Python. Use for real-time and batch speech-to-text transcription with timestamps and diarization. Triggers: "transcription", "speech to text", "Azure AI Transcription", "TranscriptionClient".

🦀 ClawHub1.9k dl

Build backend AI with Vercel AI SDK v6 stable. Covers Output API (replaces generateObject/streamObject), speech synthesis, transcription, embeddings, MCP tools with security guidance. Includes v4→v5 migration and 15 error solutions with workarounds. Use when: implementing AI SDK v5/v6, migrating versions, troubleshooting AI_APICallError, Workers startup issues, Output API errors, Gemini caching issues, Anthropic tool errors, MCP tools, or stream resumption failures.

🦀 ClawHub1.7k dl

Play Music from YouTube

Play music on YouTube via browser automation with playwright-cli. Use when the user wants to: (1) play a specific song (e.g. 'play Money Money Money by ABBA') (2) play songs by an artist as a playlist or mix (e.g. 'play Jay Chou's songs') (3) play genre or mood-based music (e.g. 'play relaxing spa music', 'play 60s Chinese oldies') (4) control playback — next, pause, resume, stop, skip ad, change song, close the player. Also handles song/artist name corrections from voice transcription erro

🦀 ClawHub1.6k dl

Meta Video Ad Analyzer

Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.

🦀 ClawHub1.5k dl

MarkItDown Skill

OpenClaw agent skill for converting documents to Markdown. Documentation and utilities for Microsoft's MarkItDown library. Supports PDF, Word, PowerPoint, Excel, images (OCR), audio (transcription), HTML, YouTube.

🦀 ClawHub1.4k dl

Local Voice (FluidAudio TTS/STT)

Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.

🦀 ClawHub1.4k dl

Video summarization for Bilibili, Xiaohongshu, Douyin, and YouTube. Extract insights from video content through transcription and summarization.

🦀 ClawHub1.2k dl

Speechall command-line tool for fast speech-to-text transcription using multiple providers

Install and use the speechall CLI tool for speech-to-text transcription. Use when the user wants to: (1) transcribe audio or video files to text, (2) install speechall on macOS or Linux, (3) list available STT models and their capabilities, (4) use speaker diarization, subtitles, or other transcription features from the terminal. Triggers on mentions of speechall, audio transcription CLI, or speech-to-text from the command line.

🦀 ClawHub1.2k dl

Process, enhance, and convert audio files with noise removal, normalization, format conversion, transcription, and podcast workflows.

🦀 ClawHub1.2k dl

Faster Whisper Local Service

OpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...

🦀 ClawHub966 dl

Free local speech-to-text transcription using OpenAI Whisper. Transcribe audio files (mp3, wav, m4a, ogg, etc.) to text without API costs. Use when: (1) User...

🦀 ClawHub916 dl

YouTube Long Video Transcript

YouTube long video (>1 hour) full verbatim transcription and translation workflow. Use when user needs to (1) Extract subtitles from YouTube videos, (2) Translate English transcripts to Chinese, (3) Handle long videos that exceed session limits, (4) Process DownSub API responses and generate formatted documents.

🦀 ClawHub891 dl

Faster Whisper Transcription

Transcribes local voice messages to text using Faster Whisper models for fast, privacy-focused speech recognition on audio files.

🦀 ClawHub867 dl

Improve transcription accuracy over time. Learn corrections, configure STT.

🦀 ClawHub839 dl

Faster Whisper Local

Local speech-to-text using faster-whisper. High-performance transcription with GPU acceleration support. Includes word-level timestamps and distilled models....

🦀 ClawHub813 dl

Meeting Assistant

用于构建和排查 SenseAudio 会议助手，覆盖实时会议转写、说话人区分、实时翻译、会议纪要生成、行动项提取与转录导出。Build and troubleshoot SenseAudio meeting assistants for live meeting transcription, speaker-aw...

🦀 ClawHub771 dl

acestep-lyrics-transcription

Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.

🦀 ClawHub728 dl

Complete Venice AI API toolkit - image generation, video, audio, embeddings, transcription, characters, models, and admin functions. Privacy-focused inferenc...

🦀 ClawHub697 dl

OpenAI API integration — chat completions, embeddings, image generation, audio transcription, file management, fine-tuning, and assistants via the OpenAI RES...

🦀 ClawHub693 dl

Youtube Transcription Generator

Use VLM Run (vlmrun) to generate transcriptions from YouTube videos. Download a video with yt-dlp, then run vlmrun to transcribe with optional timestamps. VLMRUN_API_KEY must be in .env; follow vlmrun-cli-skill for CLI setup and options.

🦀 ClawHub685 dl

Generate professional captions and subtitles with multi-engine transcription, word-level timing, styling presets, and burn-in.

🦀 ClawHub680 dl

Speech to Text Transcription

Transcribe audio and video files to text with speaker detection, timestamps, and format conversion.

🦀 ClawHub645 dl

Youtube Transcript Api

Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch proces...

🦀 ClawHub603 dl

Voice Transcriber Pro

Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3. Transcribes audio messages, saves both audio files and text transcript...

🦀 ClawHub579 dl

Fast on-device speech-to-text transcription on macOS 26+ using Apple Speech.framework, supporting multiple languages and output formats without model downloads.

🦀 ClawHub557 dl

Parakeet Local Asr

Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API on Ubuntu/Linux and macOS (Intel/Apple Silicon). Use w...

🦀 ClawHub535 dl

Funasr Transcribe Skill

Use when the user needs local speech-to-text transcription for audio files, especially Chinese or mixed Chinese-English audio, without relying on cloud trans...

🦀 ClawHub491 dl

Ton namespace for Netsnek e.U. audio and media processing tools. Handles audio transcription, format conversion, waveform analysis, and podcast production wo...

🦀 ClawHub466 dl

Faster Whisper Gpu

High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...

🦀 ClawHub423 dl

Timeless.day Meeting Notes

Query and manage Timeless meetings, rooms, transcripts, and AI documents. Capture podcast episodes and YouTube videos into Timeless for transcription. Use wh...

🦀 ClawHub420 dl

multimodal-parser

Unified multi-modal content parser for images, PDF, DOCX, audio, auto OCR/transcription, output structured text for LLM processing

🦀 ClawHub387 dl

MH openai-whisper-api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

🦀 ClawHub334 dl

Transcribe audio and video files using OpenAI Whisper API. Use when user wants to transcribe audio/video files, extract speech from media, or get text from r...

🦀 ClawHub333 dl

Qcut Video Edit

Run QCut's native TypeScript pipeline CLI for AI content generation, video analysis, transcription, YAML pipelines, ViMax agentic video production, and proje...

🦀 ClawHub330 dl

Meeting Notes Generator

AI-powered meeting notes generator - automatic transcription, summary, action items extraction, and task assignment. Turns meeting recordings or text into pr...

🦀 ClawHub330 dl

case.dev — a legal AI platform with encrypted document vaults, OCR, audio transcription, and legal search. This skill installs the casedev CLI and provides s...