Section 1
The Two-Step Framework
Most people try to generate a motion graphic in one shot. They open a video model, paste a prompt like "animated infographic showing AI adoption stats," and get garbage. Text gets garbled. Composition drifts. The "video" looks like soup.
The fix is dumb-simple. Split the job in two.
Step 1: Style a static image first. Use an image model to lock the composition. Title, axes, focal callout, all of it. You're building the "final frame" of the video first.
Step 2: Animate that exact image. Feed the finished image into a video model as a reference frame and tell it to "draw on" or "build up" the elements over 4 seconds. The model has a fixed target to animate from. It can't drift.
That's it. The whole guide is making each of those two prompts as good as they can be.
Static image (locks composition) → Animation pass (locks motion)
One model is great at text and layout. The other is great at motion. Stop asking one model to do both.
Why this works: text-to-video models are bad at text and bad at consistent layout. They're great at motion. Image models are great at text and layout but they make stills. When you compose the still first and use it as the reference frame for the video model, you're playing each model to its strength.
This is the same reason animation studios storyboard before they shoot. You don't improvise the composition while the camera is rolling.
Section 2
Step 1: The Static Image Prompt
Your static image is doing 80% of the work. If the composition, headline, and focal callout are right in the still, the animation pass is almost automatic. Here's the exact structure.
The 6 ingredients every static image prompt needs
Aspect ratio
Always 9:16 for short-form. State it up front. "Generate a 9:16 vertical image (1080x1920)."
Style / aesthetic
The whole vibe in one paragraph. Background, texture, color palette, the "feel" you want. Don't be shy. Pile on the descriptors. (Five plug-and-play styles in Section 4.)
Composition
Where everything goes on the page. "Title at top center, main element in the middle two-thirds, callout in the bottom right." Treat it like an art director brief.
Title / headline (exact text)
Put the actual headline in quotes inside the prompt. Don't say "a title about AI." Say title: "The AI Compounding Curve". The model copies what you give it.
Supporting labels
Every axis label, slice label, annotation. List them out explicitly. The model will not invent them well, so don't make it.
The punchline callout
One bold visual hit that earns the scroll. "THIS IS YOU" with an arrow. "$2.5B" in giant type. "0.3%" with a callout line. Without this, you've made a slide, not a graphic.
Copy-paste prompt template
Generate a 9:16 vertical image (1080x1920). STYLE: [Paste the style snippet from Section 4. Pick one and lock it in.] COMPOSITION: [Where things go on the page. Examples: "Title at top center. Main visual element fills the middle two-thirds. The bold focal callout sits in the lower-right with a hand-drawn arrow pointing at the key spot in the visual."] TITLE at the top: "[YOUR EXACT HEADLINE]" [OPTIONAL] SUBTITLE just below the title: "[YOUR EXACT SUBHEAD]" MAIN ELEMENT: [Describe the chart, diagram, illustration, or scene. Be specific. Example: "A simple line graph with x-axis labeled 2026, 2027, 2028, 2029, 2030 from left to right. Two lines on the graph: an exponential curve sweeping upward labeled 'People who use AI', and a flat line near the bottom labeled 'Everyone else'."] SUPPORTING LABELS: [List every label and annotation that should appear. Example: "Small annotations on the curve at each year: 2026 'starting', 2027 'learning', 2028 'compounding', 2029 'accelerating', 2030 'unrecognizable'."] FOCAL CALLOUT: [The one bold visual hit. Example: "A large hand-drawn arrow pointing at the steep top of the curve with the bold all-caps label 'THIS IS YOU'."] FEEL: [The emotional read. Example: "Like a smart friend sketched this in their grid notebook to explain a concept. Authentic, slightly rough, NOT digital-clean."]
Run it in GPT Image 2 first (it's the strongest for legible text and graph composition). Iterate the title and callout position before you move on. The static image is the contract. If anything is off here, the animation will inherit the problem.
Section 3
Step 2: The Animation Prompt
Now you take that static image and feed it into a video model as the reference frame. The job of this prompt is to describe the motion only. The composition is already locked in the image.
The 5 ingredients every animation prompt needs
Animation style
How the elements appear. The most reliable: "draw on / sketched in fast forward / appears letter by letter / fades in sequentially." All of these guide the model toward a build-up rather than full-frame motion.
Sequence (what comes in when)
"First the title appears, then the axes, then the main element, then the focal callout lands last." Order the reveal so the punchline arrives at the end. That's the moment that earns the loop.
Camera motion
"Very slight handheld jitter" almost always wins. It feels human without being distracting. Alternates: "slow zoom toward the center over the duration" (builds tension) or "static frame" (cleanest). Never combine three motions.
Duration + aspect ratio
4 seconds, 9:16. Don't fight it. Most short-form video models have a 4-second floor, and 4 seconds is the right length to draw-on a graphic without dragging.
The lock line
End the prompt with: "Keep composition framed exactly like the reference image." This single sentence saves 80% of bad outputs. It tells the model not to invent a new scene.
Copy-paste prompt template
Animate the reference image as a 4-second 9:16 vertical short-form clip. STYLE: [Paste the animation snippet from Section 4, matched to the style of your static image.] SEQUENCE: First the title "[YOUR TITLE]" appears letter by letter at the top. Then [describe the structural elements: axes, frames, dividers]. Then [describe the main element building in: the curve drawing across, the slices filling in, the labels appearing]. Finally the focal callout, "[YOUR PUNCHLINE]", lands as the closing visual hit. CAMERA: Very slight handheld jitter, as if someone is filming with a phone. Minimal natural shake. No zoom, no pan, no rotation. DURATION: 4 seconds. ASPECT RATIO: 9:16. Keep composition framed exactly like the reference image.
Run it through Seedance 2.0 (best motion quality for this use case) with the static image attached as --start-image. If the draw-on plays in reverse (it sometimes does, because the model is animating "from" the finished frame), see the troubleshooting fix in Section 6.
Section 4
5 Plug-and-Play Styles
Drop these straight into your prompts. Each one is a full style description for the static image plus a matching animation snippet so the motion stays on-brand.
Grid Paper Pencil Sketch
Vibe: Smart-friend explainer. Looks hand-drawn in a notebook. Reads like authority + simplicity.
Best for: Concepts, frameworks, "this vs that" comparisons, mental-model graphics.
Static image snippet
Hand-drawn pencil sketch on light blue engineering grid paper (faint blue square grid background, like an engineering notebook). All drawings, text, and elements done in black graphite pencil with a slightly imperfect, hand-drawn quality. Authentic notebook feel, NOT digital-clean. Hand-lettered text with natural inconsistency.
Animation snippet
The sketch is drawn onto the page in fast forward, like watching someone sketch it in their notebook. Title appears letter by letter first, then the structure, then the focal callout. The paper has a very slight handheld jitter as if someone is filming the notebook with a phone.
Whiteboard Marker Explainer
Vibe: Classroom / boardroom energy. Reads like a clear-eyed teacher walking you through it.
Best for: Process diagrams, before/after comparisons, business frameworks, course promo.
Static image snippet
Glossy white whiteboard background with subtle smudge marks and faint old eraser ghosts. All elements drawn in bold black dry-erase marker with occasional accent colors (red for emphasis, blue for secondary). Slight reflective sheen on the board. Hand-drawn shapes with the natural inconsistency of a marker on a smooth surface.
Animation snippet
Elements appear as if someone is drawing them in real time with a marker squeak feel. The title writes itself first, then arrows and boxes draw on in sequence, then the focal callout lands last. Very subtle camera handheld feel like a lecture being filmed.
Neon Poster / Retro Arcade
Vibe: High-energy, scroll-stopping, 80s arcade aesthetic. Reads like a hook on steroids.
Best for: Viral hooks, big-stat callouts, "shocking number" graphics, product launches.
Static image snippet
Dark navy or black background with a faint grid horizon line (like Tron / Outrun). Bold sans-serif text in bright neon magenta and cyan, with a soft outer glow on every letter. Chrome-style serif accents on the focal headline. Subtle scan-line texture across the whole image. 1980s arcade poster aesthetic.
Animation snippet
Each text block flickers on like a neon sign powering up. Slight buzz at the start, then steady glow. The focal headline arrives last with a stronger glow pulse. A very slow zoom toward the center over the duration builds intensity.
Watercolor / Hand-Painted
Vibe: Warm, soft, premium. Reads like a thoughtful editorial piece, not a hard sell.
Best for: Personal essays, brand storytelling, soft pitches, lifestyle content.
Static image snippet
Cream or off-white paper background with visible paper texture. All elements painted in soft watercolor washes with natural bleed and pooling at edges. Hand-lettered titles in warm earth tones (terracotta, slate, sage). Light pencil sketch lines visible underneath the paint in a few places, like the artist sketched first then painted over.
Animation snippet
Watercolor washes bloom onto the page as if water is being added to dried pigment. Color pools outward from the center of each shape, then settles. Hand-lettered text fades in with a soft ink-bleed effect. Subtle paper sway as if the page is moving slightly in a breeze.
Retro Magazine Print / Risograph
Vibe: Editorial cool. Reads like a vintage zine cover. Punchy without screaming.
Best for: Newsletter promos, podcast covers, "we wrote a piece about this" graphics.
Static image snippet
Off-white textured paper background with visible print grain. Two-color risograph print effect with slight registration mis-alignment between the two color plates (a warm red and a cool blue, slightly out of register so you see hints of color fringing). Bold condensed sans-serif headlines in all caps. Halftone dot texture on any solid shapes. 1970s magazine cover aesthetic.
Animation snippet
The two color plates print on in sequence. First the red plate prints with a satisfying offset, then the blue plate prints over it with a slight misalignment, completing the image. Halftone dots resolve last. Tiny paper-shift feel like the page is being pulled out of a printing press.
Mix-and-match is fair game. Hand-painted background with neon callouts, whiteboard structure with magazine print type. The framework doesn't care about the style. It cares about the split.
Section 5
The Tool Stack
You need one image model and one video model. Here's what actually works in May 2026, ranked by reliability for this specific use case.
Image Models (Step 1)
GPT Image 2 — recommended
Best legible text and graph composition. Wins on anything with headlines or labels.
Nano Banana 2 / Pro
Strongest for character / illustrated styles. Pair with text overlays in a design tool if text rendering matters.
Midjourney v7 / DALL-E
Work but weaker text rendering. Use for non-text-heavy graphics.
Video Models (Step 2)
Seedance 2.0 — recommended
Best image-to-video motion right now. 4-15 second clips, supports start-image and end-image references.
Kling 3.0
Strong alternative. Better at shorter clips (under 4s) and supports start+end frame interpolation if you want pixel-tight control.
Veo 3.1 / Runway Gen-4
Work, but neither is the cleanest fit for "build-up" reveals. Use for naturalistic motion (people, products).
The one-tool shortcut: Higgsfield
Higgsfield is a single interface that exposes 30+ image and video models, including GPT Image 2 and Seedance 2.0. One login, one CLI, one place to run both steps. It cuts the workflow from "two tabs and two billing accounts" to one chat. The static image and animation in the proof clips above were both made on Higgsfield. There's also a Higgsfield MCP that plugs the whole thing into Claude Code if you want to script it.
Section 6
Troubleshooting
Three failure modes account for almost everything that goes wrong. Here's how to fix each one in under a minute.
Problem 1: The animation plays in reverse
What's happening: You passed the finished image as --start-image. The video model is animating "outward from" the finished frame, so the elements fade out instead of in. Looks like the drawing is being erased.
The fix: Re-run with the finished image as --end-image instead, and either pass a blank version of your background (just the grid paper, just the whiteboard, etc.) as the --start-image, OR drop the start-image entirely and let the model interpolate from a blank slate to the end frame. The animation now builds toward your finished image instead of away from it.
Problem 2: The composition drifts
What's happening: The animation starts looking like your image then morphs into something else. Title moves. Callout disappears. The model is "interpreting" instead of preserving.
The fix: Add this exact sentence to the end of the animation prompt: "Keep composition framed exactly like the reference image. Do not invent new elements. Do not change the position of the title, labels, or focal callout." Three sentences. Drift drops to near zero. Also drop your --resolution down to 720p. Higher resolutions give the model more pixels to hallucinate with.
Problem 3: The text comes out garbled
What's happening: Your title says "The AI Compounding Curve" in the prompt and the image shows "Thr Al Compuning Curvee." Some image models still can't render text reliably.
The fix: Switch to GPT Image 2. Of the image models that work end-to-end without text editing, it's the strongest at this in May 2026. If you're stuck on a model that's bad at text, generate the visual without text, then overlay the text in Figma, Canva, or any design tool. Then animate the composite. The static-then-animate framework still works. You're just inserting a third step in the middle.
Section 7
When to Use This
Not every short-form video needs a motion graphic. But when one fits, this framework is the cheapest way to get one. Five places it earns its keep.
Short-form hook opens
Drop a 4-second motion graphic as the cold-open of a TikTok or Reel. It earns the scroll, sets the topic, then cuts to you talking. Higher retention than a talking head start.
Newsletter / course promo
A graphic that shows the "before and after" or the "framework" of what you're teaching. Works in the social post promoting the newsletter and in the newsletter itself.
Static-to-motion paid ads
Take a static ad that's already converting and animate it. Motion ads outperform static at the same spend in most accounts. No new copy, no new concept, just movement.
Podcast / YouTube trailers
10-15 seconds of stitched motion graphics with the show's main beats. Use the same style for every episode so the trailer becomes a brand asset.
Data / stat graphics
Any time you'd otherwise just say a number in a video, animate it instead. "0.3% of people use AI agents" hits harder when it's a pie chart drawing on with a "this is you" arrow.
Concept / framework reveals
Explaining a mental model, framework, or system? Build it as a diagram-style motion graphic. The build-up over 4 seconds gives the viewer the structure before you start narrating it.
Work with Me
Need AI to actually work for your business?
I help businesses cut through the AI hype and build the workflows, automations, and systems that actually move the needle. Direct, hands-on, no fluff.
Work with me