BytesAgainBytesAgain

Find the Right AI Skill for Any Job

Browse 1+ curated AI agent skills. Search by use case, filter by category, get the right tool instantly.

Browse by Use Case โ†’Pick My Role

All Skills โ€” video

1 skills in "video" matching "transcription"

๐Ÿฆ€ ClawHub33.5k dl
Markdown Converter
Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
๐Ÿฆ€ ClawHub20.2k dl
Openai Whisper Api
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
๐Ÿฆ€ ClawHub9.8k dl
Local Whisper
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
๐Ÿฆ€ ClawHub5.8k dl
Faster Whisper
Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...
๐Ÿฆ€ ClawHub3.1k dl
AudioPod
Use AudioPod AI's API for audio processing tasks including AI music generation (text-to-music, text-to-rap, instrumentals, samples, vocals), stem separation, text-to-speech, noise reduction, speech-to-text transcription, speaker separation, and media extraction. Use when the user needs to generate music/songs/rap from text, split a song into stems/vocals/instruments, generate speech from text, clean up noisy audio, transcribe audio/video, or extract audio from YouTube/URLs. Requires AUDIOPOD_API
๐Ÿฆ€ ClawHub2.9k dl
AssemblyAI advanced speech transcription
Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi...
๐Ÿฆ€ ClawHub2.6k dl
Speech is Cheap Transcribe
Fast, affordable automatic speech-to-text transcription supporting 100 languages, speaker diarization, word timestamps, and customizable output formats.
๐Ÿฆ€ ClawHub2.6k dl
it will help you to send voice messages to your AI Assistant and also can make it talk
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
๐Ÿฆ€ ClawHub2.4k dl
Elevenlabs Transcribe
Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.
๐Ÿฆ€ ClawHub2.4k dl
Speech To Text
Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...
๐Ÿฆ€ ClawHub2.2k dl
DeepGram Speech platform
Command-line tool for fast, accurate speech-to-text transcription from local files, URLs, or live audio using Deepgramโ€™s API with customizable options.
๐Ÿฆ€ ClawHub2.0k dl
Aliyun Asr
Pure Aliyun ASR skill for voice message transcription, supports multiple channels including Feishu
๐Ÿฆ€ ClawHub2.0k dl
Azure Ai Transcription Py
Azure AI Transcription SDK for Python. Use for real-time and batch speech-to-text transcription with timestamps and diarization. Triggers: "transcription", "speech to text", "Azure AI Transcription", "TranscriptionClient".
๐Ÿฆ€ ClawHub1.9k dl
Ai Sdk Core
Build backend AI with Vercel AI SDK v6 stable. Covers Output API (replaces generateObject/streamObject), speech synthesis, transcription, embeddings, MCP tools with security guidance. Includes v4โ†’v5 migration and 15 error solutions with workarounds. Use when: implementing AI SDK v5/v6, migrating versions, troubleshooting AI_APICallError, Workers startup issues, Output API errors, Gemini caching issues, Anthropic tool errors, MCP tools, or stream resumption failures.
๐Ÿฆ€ ClawHub1.7k dl
Play Music from YouTube
Play music on YouTube via browser automation with playwright-cli. Use when the user wants to: (1) play a specific song (e.g. 'play Money Money Money by ABBA') (2) play songs by an artist as a playlist or mix (e.g. 'play Jay Chou's songs') (3) play genre or mood-based music (e.g. 'play relaxing spa music', 'play 60s Chinese oldies') (4) control playback โ€” next, pause, resume, stop, skip ad, change song, close the player. Also handles song/artist name corrections from voice transcription erro
๐Ÿฆ€ ClawHub1.6k dl
Meta Video Ad Analyzer
Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.
๐Ÿฆ€ ClawHub1.5k dl
MarkItDown Skill
OpenClaw agent skill for converting documents to Markdown. Documentation and utilities for Microsoft's MarkItDown library. Supports PDF, Word, PowerPoint, Excel, images (OCR), audio (transcription), HTML, YouTube.
๐Ÿฆ€ ClawHub1.4k dl
Local Voice (FluidAudio TTS/STT)
Local text-to-speech (TTS) and speech-to-text (STT) using FluidAudio on Apple Silicon. Sub-second voice synthesis and transcription running entirely on-device via the Apple Neural Engine. Use when setting up local voice capabilities, voice assistant integration, or replacing cloud TTS/STT services.
๐Ÿฆ€ ClawHub1.4k dl
Video Summary
Video summarization for Bilibili, Xiaohongshu, Douyin, and YouTube. Extract insights from video content through transcription and summarization.
๐Ÿฆ€ ClawHub1.2k dl
Speechall command-line tool for fast speech-to-text transcription using multiple providers
Install and use the speechall CLI tool for speech-to-text transcription. Use when the user wants to: (1) transcribe audio or video files to text, (2) install speechall on macOS or Linux, (3) list available STT models and their capabilities, (4) use speaker diarization, subtitles, or other transcription features from the terminal. Triggers on mentions of speechall, audio transcription CLI, or speech-to-text from the command line.
๐Ÿฆ€ ClawHub1.2k dl
Audio
Process, enhance, and convert audio files with noise removal, normalization, format conversion, transcription, and podcast workflows.
๐Ÿฆ€ ClawHub1.2k dl
Faster Whisper Local Service
OpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...
๐Ÿฆ€ ClawHub966 dl
Whisper STT
Free local speech-to-text transcription using OpenAI Whisper. Transcribe audio files (mp3, wav, m4a, ogg, etc.) to text without API costs. Use when: (1) User...
๐Ÿฆ€ ClawHub916 dl
YouTube Long Video Transcript
YouTube long video (>1 hour) full verbatim transcription and translation workflow. Use when user needs to (1) Extract subtitles from YouTube videos, (2) Translate English transcripts to Chinese, (3) Handle long videos that exceed session limits, (4) Process DownSub API responses and generate formatted documents.
๐Ÿฆ€ ClawHub891 dl
Faster Whisper Transcription
Transcribes local voice messages to text using Faster Whisper models for fast, privacy-focused speech recognition on audio files.
๐Ÿฆ€ ClawHub867 dl
Listen
Improve transcription accuracy over time. Learn corrections, configure STT.
๐Ÿฆ€ ClawHub839 dl
Faster Whisper Local
Local speech-to-text using faster-whisper. High-performance transcription with GPU acceleration support. Includes word-level timestamps and distilled models....
๐Ÿฆ€ ClawHub813 dl
Meeting Assistant
็”จไบŽๆž„ๅปบๅ’ŒๆŽ’ๆŸฅ SenseAudio ไผš่ฎฎๅŠฉๆ‰‹๏ผŒ่ฆ†็›–ๅฎžๆ—ถไผš่ฎฎ่ฝฌๅ†™ใ€่ฏด่ฏไบบๅŒบๅˆ†ใ€ๅฎžๆ—ถ็ฟป่ฏ‘ใ€ไผš่ฎฎ็บช่ฆ็”Ÿๆˆใ€่กŒๅŠจ้กนๆๅ–ไธŽ่ฝฌๅฝ•ๅฏผๅ‡บใ€‚Build and troubleshoot SenseAudio meeting assistants for live meeting transcription, speaker-aw...
๐Ÿฆ€ ClawHub771 dl
acestep-lyrics-transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.
๐Ÿฆ€ ClawHub728 dl
Venice API Kit
Complete Venice AI API toolkit - image generation, video, audio, embeddings, transcription, characters, models, and admin functions. Privacy-focused inferenc...
๐Ÿฆ€ ClawHub697 dl
Openai
OpenAI API integration โ€” chat completions, embeddings, image generation, audio transcription, file management, fine-tuning, and assistants via the OpenAI RES...
๐Ÿฆ€ ClawHub693 dl
Youtube Transcription Generator
Use VLM Run (vlmrun) to generate transcriptions from YouTube videos. Download a video with yt-dlp, then run vlmrun to transcribe with optional timestamps. VLMRUN_API_KEY must be in .env; follow vlmrun-cli-skill for CLI setup and options.
๐Ÿฆ€ ClawHub685 dl
Video Captions
Generate professional captions and subtitles with multi-engine transcription, word-level timing, styling presets, and burn-in.
๐Ÿฆ€ ClawHub680 dl
Speech to Text Transcription
Transcribe audio and video files to text with speaker detection, timestamps, and format conversion.
๐Ÿฆ€ ClawHub645 dl
Youtube Transcript Api
Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch proces...
๐Ÿฆ€ ClawHub603 dl
Voice Transcriber Pro
Voice note transcription and archival for OpenClaw agents. Powered by Deepgram Nova-3. Transcribes audio messages, saves both audio files and text transcript...
๐Ÿฆ€ ClawHub579 dl
yap
Fast on-device speech-to-text transcription on macOS 26+ using Apple Speech.framework, supporting multiple languages and output formats without model downloads.
๐Ÿฆ€ ClawHub557 dl
Parakeet Local Asr
Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API on Ubuntu/Linux and macOS (Intel/Apple Silicon). Use w...
๐Ÿฆ€ ClawHub535 dl
Funasr Transcribe Skill
Use when the user needs local speech-to-text transcription for audio files, especially Chinese or mixed Chinese-English audio, without relying on cloud trans...
๐Ÿฆ€ ClawHub491 dl
ton
Ton namespace for Netsnek e.U. audio and media processing tools. Handles audio transcription, format conversion, waveform analysis, and podcast production wo...
๐Ÿฆ€ ClawHub466 dl
Faster Whisper Gpu
High-performance local speech-to-text transcription using Faster Whisper with NVIDIA GPU acceleration. Transcribe audio files locally without sending data to...
๐Ÿฆ€ ClawHub423 dl
Timeless.day Meeting Notes
Query and manage Timeless meetings, rooms, transcripts, and AI documents. Capture podcast episodes and YouTube videos into Timeless for transcription. Use wh...
๐Ÿฆ€ ClawHub420 dl
multimodal-parser
Unified multi-modal content parser for images, PDF, DOCX, audio, auto OCR/transcription, output structured text for LLM processing
๐Ÿฆ€ ClawHub387 dl
MH openai-whisper-api
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
๐Ÿฆ€ ClawHub334 dl
transcription
Transcribe audio and video files using OpenAI Whisper API. Use when user wants to transcribe audio/video files, extract speech from media, or get text from r...
๐Ÿฆ€ ClawHub333 dl
Qcut Video Edit
Run QCut's native TypeScript pipeline CLI for AI content generation, video analysis, transcription, YAML pipelines, ViMax agentic video production, and proje...
๐Ÿฆ€ ClawHub330 dl
Meeting Notes Generator
AI-powered meeting notes generator - automatic transcription, summary, action items extraction, and task assignment. Turns meeting recordings or text into pr...
๐Ÿฆ€ ClawHub330 dl
case.dev
case.dev โ€” a legal AI platform with encrypted document vaults, OCR, audio transcription, and legal search. This skill installs the casedev CLI and provides s...