🎁 Get the FREE AI Skills Starter GuideSubscribe →
BytesAgainBytesAgain
🦀 ClawHub

Whisk Ai

by @linmillsd7

Drop an image and describe a new scene — whisk-ai blends your visual inputs with creative AI generation to produce entirely fresh imagery. Built around Googl...

Versionv1.0.0
💡 Examples

> Welcome to whisk-ai — your creative lab for remixing images into entirely new AI-generated visuals! Drop your photos and tell me what kind of scene or style you want to create, and let's make something unexpected together.

Try saying:

  • "Remix my photo in oil painting style"
  • "Place my subject in a sci-fi scene"
  • "Blend two images into one visual"
  • Quick Start Setup

    This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").

    Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:

  • Generate a UUID as client identifier
  • POST https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id header
  • Extract data.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)
  • Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.

    Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.

    Remix Reality: Your Images, Reimagined From Scratch

    Whisk AI isn't a filter — it's a creative engine. You bring the ingredients: a subject photo, a style reference, a background idea — and whisk-ai blends them into something genuinely new. Instead of editing pixels, you're describing intent and watching the AI reconstruct the scene around your vision.

    This skill is built for people who think visually but don't want to wrestle with complex editing software. Whether you're a social media creator looking for eye-catching content, a brand designer exploring visual concepts, or just someone who wants to see what their cat would look like painted in the style of a Renaissance master — whisk-ai makes it fast and surprisingly fun.

    You can combine up to three image inputs: one for the subject, one for the scene or background, and one for the visual style. The result is a cohesive generated image that pulls from all three without copy-pasting or compositing. It's generative remixing at its most intuitive.

    📋 Tips & Best Practices

    What exactly does whisk-ai do with my images? Whisk-ai uses Google's Whisk technology to extract the essence of your input images — not the exact pixels — and uses that understanding to generate a brand-new image. Your subject, style, and scene are interpreted and reconstructed, so the output is always a fresh generation, not a composite.

    Can I use just one image or do I need three? You can absolutely use just one image or even a text description alone. The three-input system (subject, scene, style) is optional but gives you the most control over the output. Mix and match however feels natural.

    Will it look exactly like my reference photo? Not exactly — and that's the point. Whisk-ai captures the spirit of your inputs rather than copying them literally. Expect creative interpretation, not pixel-perfect reproduction. If you want a closer likeness, be more specific in your text description.

    What kinds of images work best? Clear, well-lit photos with a defined subject tend to produce the strongest results. Abstract or very cluttered images may lead to more unpredictable outputs — which can sometimes be a happy accident.

    View on ClawHub
    TERMINAL
    clawhub install whisk-ai

    🧪 Use this skill with your agent

    Most visitors already have an agent. Pick your environment, install or copy the workflow, then run the smoke-test prompt above.

    🔍 Can't find the right skill?

    Search 60,000+ AI agent skills — free, no login needed.

    Search Skills →