šŸŽ Get the FREE AI Skills Starter Guide — Subscribe →
BytesAgainBytesAgain
šŸ¦€ ClawHub

Audio Transcribe

by @zxkane

This skill should be used when the user explicitly asks to "transcribe a meeting", "transcribe audio", "transcribe a meeting recording", "convert audio to te...

šŸ’” Examples

1. Environment Setup

AUTO_YES=1 bash $SCRIPTS/setup_env.sh

Or force CPU: AUTO_YES=1 bash $SCRIPTS/setup_env.sh cpu

The setup script patches FunASR's spectral clustering for O(N²·k) performance. Without this, recordings over ~1 hour hang for hours during speaker clustering.

2. Run Transcription

Output files are written to the current working directory.

LLM cleanup (Phase 3) is opt-in. By default, transcription runs locally without contacting any external service. To enable LLM-powered ASR correction and speaker name refinement, pass --model . Use LLM cleanup when:

  • The raw transcript has many ASR errors (names, technical terms)
  • You need polished, publication-ready output
  • Speaker names need to be refined from context
  • > āš ļø Data Privacy: When LLM cleanup is enabled via --model, transcript > excerpts are sent to external LLM providers (AWS Bedrock, Anthropic, or > OpenAI depending on the model ID). Use --skip-llm or omit --model to > keep all data local. For Bedrock, boto3 uses the standard AWS credential > chain (IAM role, SSO, ~/.aws/credentials, env vars).

    # Chinese meeting with hotwords (local-only, no LLM)
    python3 $SCRIPTS/transcribe.py meeting.wav \
        --lang zh --num-speakers 9 --hotwords hotwords.txt

    English meeting with speaker names

    python3 $SCRIPTS/transcribe.py meeting.wav \ --lang en --speakers "Alice,Bob,Carol,Dave"

    Auto-detect language (zh/en/ja/ko/yue)

    python3 $SCRIPTS/transcribe.py meeting.wav \ --lang auto --num-speakers 6

    Whisper for any language

    python3 $SCRIPTS/transcribe.py meeting.wav \ --lang whisper --num-speakers 4

    Enable LLM cleanup for polished output (requires --model)

    Bedrock (uses AWS credential chain: IAM role, SSO, ~/.aws/credentials)

    python3 $SCRIPTS/transcribe.py meeting.wav \ --lang zh --num-speakers 9 --hotwords hotwords.txt \ --provider bedrock --model us.anthropic.claude-sonnet-4-6

    Bedrock "global" cross-region profile (recent AWS deployments)

    python3 $SCRIPTS/transcribe.py meeting.wav \ --provider bedrock --model global.anthropic.claude-sonnet-4-6

    Bedrock via litellm-style wrapper (supported; prefix is stripped for boto3)

    python3 $SCRIPTS/transcribe.py meeting.wav \ --provider bedrock --model amazon-bedrock/global.anthropic.claude-sonnet-4-6

    Anthropic API (requires ANTHROPIC_API_KEY env var)

    python3 $SCRIPTS/transcribe.py meeting.wav \ --provider anthropic --model claude-sonnet-4-6

    OpenAI-compatible API (requires OPENAI_API_KEY env var)

    python3 $SCRIPTS/transcribe.py meeting.wav \ --provider openai --model gpt-4o

    Full pipeline with all supporting files + LLM (best quality)

    python3 $SCRIPTS/transcribe.py episode.m4a \ --lang zh --num-speakers 2 \ --hotwords hotwords.txt \ --speakers "关羽,张飞" \ --speaker-context speaker-context.json \ --reference show-notes.md \ --model us.anthropic.claude-sonnet-4-6

    Resume interrupted LLM cleanup

    python3 $SCRIPTS/transcribe.py meeting.wav \ --skip-transcribe --model us.anthropic.claude-sonnet-4-6

    3. Verify Speaker Labels

    If the transcript has swapped speaker labels (common with podcasts), the verification script can detect and fix mismatches using LLM analysis:

    # Dry-run: check if host/guest are swapped
    python3 $SCRIPTS/verify_speakers.py podcast_raw_transcript.json \
        --speakers "关羽,张飞" \
        --speaker-context speaker-context.json

    Apply the fix

    python3 $SCRIPTS/verify_speakers.py podcast_raw_transcript.json \ --speakers "关羽,张飞" \ --speaker-context speaker-context.json --fix

    Multi-speaker meeting: full reassignment

    python3 $SCRIPTS/verify_speakers.py meeting_raw_transcript.json \ --speakers "Alice,Bob,Carol,Dave" \ --speaker-context speaker-context.json --fix

    Then regenerate the markdown with corrected labels

    python3 $SCRIPTS/transcribe.py original.m4a \ --skip-transcribe --clean-cache

    The script analyzes the first 5 minutes (configurable with --minutes) and auto-detects podcast (2 speakers, swap detection) vs meeting (N speakers, full reassignment).

    View on ClawHub
    TERMINAL
    clawhub install zxkane-audio-transcriber-funasr

    🧪 Use this skill with your agent

    Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

    šŸ” Can't find the right skill?

    Search 60,000+ AI agent skills — free, no login needed.

    Search Skills →