Wonda
by @degausai
Using the Wonda CLI to generate images, videos, music, and audio from the terminal — plus LinkedIn, Reddit, and X/Twitter research and automation
wonda auth login (opens browser, recommended) or set WONDERCAT_API_KEY env varwonda auth checkAccess tiers
Not all commands are available to every account type:
| Tier | Access |
| ------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- |
| Anonymous (temporary account, no login) | Media upload/download, editing (video/edit, image/edit, audio/edit), transcription, social publishing, scraping, analytics |
| Free (logged in, Basic/Free plan) | Everything above + generation (image/generate, video/generate, etc.), styles, recipes, brand |
| Paid (Plus, Pro, or Absolute plan) | Everything above + video analysis (requires credits), skill commands (wonda skill install/list/get) |
If a command returns a 403 error, check your plan at https://app.wondercat.ai/settings/billing.
Social signups (Instagram, TikTok, etc.)
Drive them with the wonda device primitives + a throwaway mailbox from wonda email. The screenshot → decide → tap/type/swipe loop is how these flows work — there's no shortcut command, and that's fine: social apps change their UI constantly and any canned flow would drift faster than you could maintain it.
Standard loop:
1. wonda email account create --random → save {email, password}.
2. wonda device create → pick a ready device (poll wonda device get ).
3. wonda device launch (or com.zhiliaoapp.musically for TikTok). Fall back to wonda device open-url if you'd rather start in the web flow.
4. Loop: wonda device screenshot → decode the base64 PNG → read → pick an action → tap | type | swipe | key → screenshot again. Use --text "SomeButtonLabel" on tap before guessing coordinates; fall back to --x --y read off the screenshot for elements without matching text (number pickers, date spinners, etc.).
5. When the app sends a verification email, wonda email inbox wait — returns {codes: ["483921"], links: [...]} with the 6-digit code already extracted. wonda device type " to feed it back.
6. For number/date spinners: tap on the highlighted cell, Android pops up a numeric or alphabetic keyboard, wonda device type --text " replaces the selected text. wonda device key --code 4 dismisses the keyboard when done.
Consent-like taps — anything that accepts Terms/Privacy/Cookies, grants permissions, or publishes something — stop and ask the user for explicit confirmation in chat before tapping. That isn't about signups specifically; it applies to any automation step.
Rate-limit signals — if the app shows you a visual puzzle ("we want to make sure you're a real person"), stop and hand off to the user with wonda device stream (see next section). Don't click through puzzles yourself.
Handing off to a human
If automation hits a screen that requires a human to take over (consent flow you shouldn't auto-accept, ambiguous UI, step where the user prefers to act themselves), use wonda device stream — returns a playerUrl signed with a short-lived JWT (1h). Give that URL to the user, they act in their own browser, and automation can resume afterward.
wonda device stream
→ { "streamUrl": "wss://…", "playerUrl": "https://…", "deviceType": "social" }
Global output flags
All commands support these output control flags:
--json — Force JSON output (auto-enabled when stdout is piped)--quiet — Only output the primary identifier (job ID, media ID, etc.) — ideal for scripting-o — Download output to file (implies --wait)--fields status,outputs — Select specific JSON fields--jq '.outputs[0].media.url' — Filter JSON output with a jq expression| Symptom | Likely Cause | Fix |
| -------------------------------- | --------------------------------------------- | ------------------------------------------------------ |
| Sora rejected image | Person in image | Switch to kling_3_pro |
| Video adds objects not in source | Motion prompt describes elements not in image | Simplify to camera movement and atmosphere only |
| Text unreadable in video | AI tried to render text in generation | Remove text from video prompt, use textOverlay instead |
| Hands look wrong | Complex hand actions in prompt | Simplify to passive positions or frame to exclude |
| Style inconsistent across series | No shared anchor | Use same reference image via --attach |
| Changes to step A not in step B | Stale render | Re-run all downstream steps |
clawhub install wonda