🎁 Get the FREE AI Skills Starter Guide β€” Subscribe β†’
BytesAgainBytesAgain
πŸ¦€ ClawHub

Oatda Transcribe Audio

by @devcsde

Transcribe audio to text using OATDA's unified audio API. Triggers when the user wants speech-to-text, transcription of meetings, podcasts, voice notes, subt...

Versionv1.0.1
Downloads276
TERMINAL
clawhub install oatda-transcribe-audio

πŸ“– About This Skill


name: oatda-transcribe-audio description: Transcribe audio to text using OATDA's unified audio API. Triggers when the user wants speech-to-text, transcription of meetings, podcasts, voice notes, subtitles, timestamps, or Whisper-style transcription through OATDA. homepage: https://oatda.com metadata: { "openclaw": { "emoji": "πŸ“", "requires": { "bins": ["curl", "jq"], "env": ["OATDA_API_KEY"], "config": ["~/.oatda/credentials.json"] }, "primaryEnv": "OATDA_API_KEY", }, }

OATDA Audio Transcription

Transcribe audio files to text through OATDA's unified audio API.

API Key Resolution

All commands need the OATDA API key. Resolve it inline for each exec call:

export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}"

If the key is empty or null, tell the user to get one at https://oatda.com and configure it.

Security: Never print the full API key. Only verify existence or show first 8 chars.

Model Mapping

| User says | Provider | Model | |-----------|----------|-------| | whisper, whisper-1, openai whisper (default) | openai | whisper-1 | | transcription, speech to text, stt | openai | whisper-1 |

Default: openai / whisper-1 if no model specified.

If the user provides provider/model format directly (for example openai/whisper-1), split on /.

> ⚠️ Models change over time. If a model ID fails, query oatda-list-models with ?type=audio first.

Input Preparation

The transcription endpoint supports:

  • multipart/form-data with a local file upload
  • JSON with a base64 data URL in file
  • JSON with file_base64 for providers that support direct base64 payloads
  • Maximum audio file size is 25MB.

    For local files, prefer multipart upload because it is simpler and avoids large JSON bodies.

    Discovering Audio Model Parameters

    export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
    curl -s -X GET "https://oatda.com/api/v1/llm/models?type=audio" \
      -H "Authorization: Bearer $OATDA_API_KEY" | jq '.audio_models[] | {id, supported_params}'
    

    Look for:

  • audio_modes containing transcription
  • supported response_format values
  • optional timestamp, diarization, or streaming support
  • API Call (multipart)

    export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
    curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
      -H "Authorization: Bearer $OATDA_API_KEY" \
      -F "provider=" \
      -F "model=" \
      -F "file=@" \
      -F "response_format=json"
    

    Alternative API Call (base64 JSON)

    AUDIO_DATA_URL="data:audio/mpeg;base64,$(base64 -w 0 audio.mp3)"

    export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \ curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OATDA_API_KEY" \ -d "$(jq -n \ --arg provider \"\" \ --arg model \"\" \ --arg file \"$AUDIO_DATA_URL\" \ '{provider: $provider, model: $model, file: $file, response_format: \"json\"}')"

    Common Parameters

  • language: ISO-639-1 language code like en, de, fr
  • prompt: Context for names, acronyms, or domain-specific terms
  • response_format: json, text, srt, verbose_json, vtt, or diarized_json
  • temperature: 0 to 1
  • timestamp_granularities: word and/or segment
  • chunking_strategy: auto
  • hotwords: Provider-specific keyword hints
  • stream: true if supported by the selected model
  • Response Format

    The API returns JSON like:

    {
      "text": "The transcribed text...",
      "language": "en",
      "duration": 42.5,
      "segments": [],
      "words": [],
      "costs": {
        "inputCost": 0,
        "outputCost": 0.0001,
        "totalCost": 0.0001,
        "currency": "USD"
      }
    }
    

    Present the text field to the user. Include subtitles, segments, or words if the requested format includes them.

    Error Handling

    | HTTP Status | Meaning | Action | |-------------|---------|--------| | 401 | Invalid API key | Tell user to check their key | | 402 | Insufficient credits | Tell user to check balance | | 400 | Bad request / model not supported | Check model or file format and query oatda-list-models with type=audio | | 413 | File too large | Keep audio under 25MB or split it | | 429 | Rate limited or monthly cap | Wait briefly and retry once |

    Example

    export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
    curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
      -H "Authorization: Bearer $OATDA_API_KEY" \
      -F "provider=openai" \
      -F "model=whisper-1" \
      -F "file=@meeting.mp3" \
      -F "response_format=json"
    

    Notes

  • Endpoint: /api/v1/llm/transcriptions
  • Prefer multipart upload for local files
  • Use response_format=srt or vtt for subtitles
  • Use language to improve recognition when source language is known
  • Equivalent capability name: transcribe_audio
  • Related skills: oatda-generate-speech, oatda-translate-audio, oatda-list-models
  • πŸ’‘ Examples

    export OATDA_API_KEY="${OATDA_API_KEY:-$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)}" && \
    curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
      -H "Authorization: Bearer $OATDA_API_KEY" \
      -F "provider=openai" \
      -F "model=whisper-1" \
      -F "file=@meeting.mp3" \
      -F "response_format=json"
    

    πŸ“‹ Tips & Best Practices

  • Endpoint: /api/v1/llm/transcriptions
  • Prefer multipart upload for local files
  • Use response_format=srt or vtt for subtitles
  • Use language to improve recognition when source language is known
  • Equivalent capability name: transcribe_audio
  • Related skills: oatda-generate-speech, oatda-translate-audio, oatda-list-models