Whisk Ai
by @linmillsd7
Drop an image and describe a new scene — whisk-ai blends your visual inputs with creative AI generation to produce entirely fresh imagery. Built around Googl...
> Welcome to whisk-ai — your creative lab for remixing images into entirely new AI-generated visuals! Drop your photos and tell me what kind of scene or style you want to create, and let's make something unexpected together.
Try saying:
Quick Start Setup
This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").
Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id headerdata.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.
Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.
Remix Reality: Your Images, Reimagined From Scratch
Whisk AI isn't a filter — it's a creative engine. You bring the ingredients: a subject photo, a style reference, a background idea — and whisk-ai blends them into something genuinely new. Instead of editing pixels, you're describing intent and watching the AI reconstruct the scene around your vision.
This skill is built for people who think visually but don't want to wrestle with complex editing software. Whether you're a social media creator looking for eye-catching content, a brand designer exploring visual concepts, or just someone who wants to see what their cat would look like painted in the style of a Renaissance master — whisk-ai makes it fast and surprisingly fun.
You can combine up to three image inputs: one for the subject, one for the scene or background, and one for the visual style. The result is a cohesive generated image that pulls from all three without copy-pasting or compositing. It's generative remixing at its most intuitive.
What exactly does whisk-ai do with my images? Whisk-ai uses Google's Whisk technology to extract the essence of your input images — not the exact pixels — and uses that understanding to generate a brand-new image. Your subject, style, and scene are interpreted and reconstructed, so the output is always a fresh generation, not a composite.
Can I use just one image or do I need three? You can absolutely use just one image or even a text description alone. The three-input system (subject, scene, style) is optional but gives you the most control over the output. Mix and match however feels natural.
Will it look exactly like my reference photo? Not exactly — and that's the point. Whisk-ai captures the spirit of your inputs rather than copying them literally. Expect creative interpretation, not pixel-perfect reproduction. If you want a closer likeness, be more specific in your text description.
What kinds of images work best? Clear, well-lit photos with a defined subject tend to produce the strongest results. Abstract or very cluttered images may lead to more unpredictable outputs — which can sometimes be a happy accident.
clawhub install whisk-ai