How to Prompt PixAI v4.0 Preview for Anime Video — From Image to 15 Seconds

How to prompt PixAI v4.0 preview for anime image-to-video: reference setup, iteration loop, voice line format, and the Mio.2 pre-production workflow we use to ship 15-second anime shorts.

PIXAI ▸ FILM LAB
VOL. 04 ▸ NO. 1
PROMPT TUTORIAL
FIELD-TESTED

— THE FIELD-TESTED GUIDE —

How to Prompt PixAI v4.0 Preview
— for anime image-to-video, from reference setup to 15-second short

Reference setup, the iteration loop, voice line format, and the Mio.2 pre-production workflow we use to ship 15-second anime shorts.

Most prompt guides for video models read like the writer never ran a generation. They hand you the syntax — subject, action, camera, audio — and call it done. Then you try to animate the character art you’ve been saving for weeks. The video comes back static. The eyepatch slides off mid-shot. The camera refuses to move no matter how many times you type “cinematic.”

This guide is the field-tested version. Three things matter, in this order: how you stack your references, what you put in the text prompt, and how you iterate when the first try fails. The text prompt is the easy part. It’s also the part most guides obsess over.

New to v4.0 Preview and want the capability overview first — what it can actually do, how it compares to PixAI’s older models, what audio and reference video unlock? Start with our companion piece: Meet PixAI v4.0 Preview →

If you want the broader model lineup and i2v panel walkthrough — every video model PixAI offers, when to use which, panel settings explained — read our PixAI Image-to-Video Tutorial: Model Guide + Prompt Writing → Otherwise, read on.

TL;DR

How to prompt PixAI v4.0 for anime video, the short version

01
Use one clean character image as your reference — not the first frame.
02
Don’t describe the character in the prompt — let the reference do that.
03
Write camera moves with real verbs (push in, tilt, orbit), not “zoom in.”
04
Test on v4.0 Lite Preview first, finish on Preview.
05
Use Mio.2 to handle pre-production — pitches, scripts, shot lists, prompts.

PART ONE

Working with references
— images, video, and audio

v4.0 Preview takes three kinds of reference. You call each by tag in your prompt:

Type
Tag
Max
Used for
Image
@image1–6
6
Character identity, outfit, scene, art style anchor
Video
@video1–3
3
Camera motion, pacing, framing feel — combined length under 15s
Audio
@audio1–3
3
Voice character or musical mood anchor

Most prompts don’t fail on bad text. They fail on bad reference setup. Spend your effort here.

— HOW TO ATTACH REFERENCES —

A short walkthrough of the reference upload flow.

▸ 01

Reference images are not first frames

A semantic anchor, not a starting position.

Older video models — including PixAI’s own v3.x line — treat the reference image as the first frame. Whatever pose your character is in, whatever’s behind her, all of it gets locked into the opening shot. The model then has to animate forward from there. Reference shows her sitting, prompt says “she leaps into the air” — good luck.

v4.0 Preview doesn’t work that way. It reads the image as a semantic reference for who the character is, not as the starting frame. The model separates identity (hair, eyes, outfit, eyepatch placement, the cat ears, the heart-shaped ahoge) from everything else. Then your prompt drives the action.

What this means in practice:

  • One clean character portrait is enough. You don’t need multiple angles of the same character. We ran side-by-sides with 4-image and 6-image setups of the same OC — the 4-image version was consistently sharper. The model wasn’t trying to reconcile six slightly different jaw shapes.
▸ 4 REFERENCE IMAGES

Sharper character consistency.

▸ 6 REFERENCE IMAGES

More references, more averaging — output softens.

  • The pose in your reference doesn’t constrain the action. If your reference shows her sitting on a swing, you can prompt “she runs across a stage” and v4.0 Preview will execute the run without dragging the swing along. We tested this exact case on v3.0 High Consistency. It kept her seated.
▸ V4.0 PREVIEW

v4.0 Preview. Reads the reference as identity, not pose — the character leaps as prompted.

▸ V3.0 HIGH CONSISTENCY

v3.0 HC. Treats the reference as a starting frame — character stays seated.

  • For scene-specific work, give it two images and tell it how to combine them. Character on @image1, empty scene on @image2, then: “She stands in the scene from @image2, wearing the outfit from @image1.” v4.0 Preview composites cleanly without confusing whose face goes where.

— EXAMPLE —

Same character, multiple scenes, outfit swap

▸ @IMAGE1 CHARACTER
PixAI character reference for v4.0 anime video
▸ @IMAGE2 SCENE
PixAI scene reference dressing room
▸ @IMAGE3 SCENE
PixAI scene reference stage
▸ @IMAGE4 OUTFIT
PixAI outfit reference
▸ OUTPUT ▸ V4.0 PREVIEW
▸ VIEW PROMPT
@image1 character is the subject throughout this video, maintaining her multicolored white-pink gradient hair, heart ahoge, purple eyes, white eyepatch, fang, cat ears, and cat tail at all times.

[0-4s] She sits in front of the vintage vanity mirror from @image2 in the warmly lit dressing room, her back partially to the camera so her reflection is visible in the mirror. She looks contemplatively at her reflection, slowly reaching up to touch a strand of hair. Soft pink light bathes the intimate space, the curtains gently moving in the background.

[4-6s] The reflection in the mirror begins to shimmer with dreamlike light, pastel ribbons and cherry petals starting to drift through the air, the dressing room slowly dissolving into a glowing pastel haze.

[6-10s] She now stands at center stage from @image3, under the bright spotlight beam, holding a vintage keytar with both hands, looking forward with quiet confidence. Pink and red ribbons float around her, cherry petals drift through the spotlight beams, soft mist rises gently from the wooden stage floor. She lifts her right hand slightly as if ready to play.

Continuous seamless dreamlike transition through the mirror as a portal between her private self and her stage self.

You can attach up to 6 images. You usually shouldn’t. Two or three well-chosen references beat five or six competing for the model’s attention — every extra image is another vote on what the character should look like, and the model averages them.

▸ 02

Reference video transfers feel, not content

Borrow the camera, not the character.

You’re not telling v4.0 Preview to copy a video. You’re telling it to borrow the camera motion, pacing, and overall framing from that video and apply them to your scene.

Say you’ve saved a gacha-card animation you love — slow push-in, soft parallax, a dreamy reveal at the end. Upload it as @video1 and write:

Animate this character following the slow camera motion and pacing of @video1.

v4.0 Preview studies how the camera moved, how each beat held, how the particles flowed. It then applies that pacing to your character. Your character doesn’t get rendered in the reference video’s art style. The reference video’s character doesn’t appear in yours. What transfers is the feel.

— EXAMPLE 01 ▸ GACHA CARD REVEAL —
▸ @IMAGE1 BASE
PixAI base character art for gacha reveal
▸ OUTPUT ▸ V4.0 PREVIEW
— EXAMPLE 02 ▸ GACHA CARD REVEAL —
▸ @IMAGE1 BASE
PixAI base character art second example
▸ OUTPUT ▸ V4.0 PREVIEW
▸ VIEW PROMPT (used for both examples)
Transform this static character art from @image1 into a gacha-style animated character card, following the same camera motion and pacing style as @video1.

▸ TECH LIMITS

mp4, under 50MB, under 15 seconds each. If you use multiple video references, their combined length still has to fit under 15s.

▸ 03

Reference audio — for voice character or musical mood

For the sonic qualities text can’t capture.

You can attach an audio file when you want v4.0 Preview to anchor to a specific sound quality — a voice timbre the character should speak in, or a piece of music whose mood and instrumentation you want the soundtrack to match.

In your prompt, name what you’re borrowing — “use the voice character of @audio1” or “match the mood and instrumentation of @audio1.” Same logic as video references.

For dialogue, ambient sound, and SFX, you usually don’t need an audio reference at all. Those go directly into the prompt itself (covered in the next section). Audio references are for the harder-to-describe sonic qualities.

— EXAMPLE ▸ SAME BASE, THREE AUDIO MOODS —

Same character, same prompt, same motion. Only the audio reference changes.

▸ @IMAGE1 BASE
PixAI audio mood base image

▸ @AUDIO1 ▸ SORROW

Sorrow. Melancholy strings, slow tempo, somber atmosphere.

▸ @AUDIO1 ▸ DREAMY

Dreamy. Ethereal pads, floating texture, soft reverb.

▸ @AUDIO1 ▸ CHEERFUL

Cheerful. Bright tempo, upbeat instrumentation, warm mood.

▸ VIEW PROMPT
@Image1 — hair strands drift and flutter gently in the breeze. The flowers behind her sway softly with the wind. The camera slowly moves in to a close-up on her face as she reaches out her hand to touch a bubble floating in the air. Reference the rhythm and atmosphere of @Audio1.

PART TWO

Writing the text prompt
— for anime video

Most prompt guides give you a formula and call it done. Real prompt-writing splits into two situations, and the technique differs.

SITUATION ▸ 01

You know what you want

You have a clear shot in your head — camera pushes in toward her face, petals drift left to right, she turns her head on the second beat. Just write it down.

The trap here is vagueness disguised as cinematography. “Zoom in” tells v4.0 Preview less than “the camera slowly pushes in during the turn.” Use real camera verbs — push in, pull back, tilt up, drift, orbit, pan, dolly — instead of “the camera moves.” Use connectives like “then” and “and” to sequence actions: “She looks down, then slowly lifts her eyes to the camera and gives a small smile.” Without those connectives, the model often collapses everything into a single static moment.

This is the easy mode. If you already know your shot, skip to the writing-habits section below.

SITUATION ▸ 02

You have an image but no clear vision

This is where most generations actually start. You find art you love, you want it to move, you have no idea what should happen. Most guides fail you here. They assume you’ve already decided.

Here’s the iteration loop that actually works:

— THE ITERATION LOOP —
01

Draft on an LLM.

Drop your image into Claude or ChatGPT and ask for a short video prompt. Treat the result as raw material, not a final answer.

02

Test on a cheap model.

Run the draft on v4.0 Lite Preview or v2.7 High Dynamics — fast, cheap, structurally similar to Preview. See our Meet PixAI v4.0 Preview post for the full model comparison.

03

Read failure, not prompt.

Watch what the video did wrong. The fix lives in the gap between what you wrote and what the model actually produced.

04

Run the final on v4.0 Preview.

Same prompt, more capacity behind it. Smoother motion, sharper character consistency, more nuanced lighting.

Draft, cheap test, read the failure, run the final. That loop is what separates videos made by someone who knows the tool from videos that look like a first try.

▸ COMMON MISTAKES

Common AI video prompt mistakes to avoid

Five failure patterns. Five fixes.

The loop above only works if you can spot what went wrong. Here are the failure patterns we hit most often, paired with the fixes that worked.

01

Letting the LLM hallucinate features that aren’t in your reference image

LLMs read “cat girl” and assume she has a tail. They read “sailor uniform” and add a tie. Your reference might not show either. The video then awkwardly generates the invented feature mid-sequence — we hit this 4 times across our test runs. (This part still annoys me.)

▸ FIX

Read the draft prompt next to your image. Delete every feature it added that isn’t visible.

▸ SHOW EXAMPLE WITH FAILURE VIDEO
▸ BASE IMAGE
PixAI base image hallucination test
▸ FAILED OUTPUT

▸ PROMPT (with the hallucinated “tail flicking gently” bolded)

A cute young anime cat girl in close-up, white short hair, cat ears, heart-shaped ahoge, large purple eyes with small pink hearts in the pupils, wearing a navy sailor uniform with a blue ribbon scarf and a small black beret on her head, soft pink blush on her cheeks.

She is resting her chin on both hands cupped under her face, gazing softly at the camera with a shy smile. Small pink heart particles and sparkle stars float gently around her in the air. Soft cyan and pink pastel light surrounds her like a dreamy glow.

The video starts with her looking down shyly, then she slowly lifts her eyes to meet the camera and gives a tiny knowing smile, her tail flicking gently behind her, one heart particle drifting up past her face. The camera slowly pushes in toward her face during this moment.

Anime illustration style, soft painterly textures, kawaii aesthetic, dreamy pastel lighting, intimate close-up framing, gentle slow motion energy.

02

Letting the LLM describe instead of direct

The first-draft prompt is often a beautiful description of what’s already in the static art — every ribbon, every accessory, every fold of fabric — with no actual motion. The video comes out static.

▸ FIX

Cut the descriptive sentences entirely. Keep only verbs that describe what changes.

▸ SHOW EXAMPLE WITH FAILURE VIDEO
▸ FAILED OUTPUT — STATIC SCENE

▸ PROMPT

Two cat girls appear together in this video, facing each other as opposites — the angel and the devil.

Character from @image1 and character from @image2 face each other in an empty void of soft pastel light, no environment, no background — only the two characters and a dreamy gradient atmosphere of soft pink and lavender mist around them. They stand close together, the angel on one side and the devil on the other, eyes meeting in mutual recognition.

The angel slowly raises her hands to form a heart shape near her chest with a gentle smile. The devil mirrors the gesture but with a playful smirk, sticking out her tongue. They lean slightly toward each other, sharing a quiet moment of acknowledgment as if they understand they are two halves of the same person. Soft sparkle particles drift between them in the void. The camera holds steady at medium shot, framing both characters in the same frame throughout.

03

Stacking too many elements in one frame

Ask for a Valentine’s scene and the LLM gives you “she stands in the rose garden, holding roses, while petals fall and balloons float and chocolates drift past” — fifteen elements stacked in one shot. The result is visual noise.

▸ FIX

Limit each beat to one subject + one motion + one ambient element.

04

Letting the model invent physics to justify your prompt

We once wrote “goldfish drift around her” and v4.0 generated a water pool out of nowhere to make the goldfish make sense. (Yes — we laughed.) The model defaults to physical realism unless you tell it not to.

▸ FIX

Name the surreal element directly — “goldfish drift through the air, this is a dream, no physical realism needed.”

▸ SHOW FIRST PASS vs FIXED PASS
▸ FIRST PASS — INVENTED WATER
▸ ADJUSTED — DREAM, NOT WATER

▸ ADJUSTED PROMPT (key fix: explicit dream framing + timed beats)

Transform this static character art from @image1 into a gacha-style animated character card, following the same camera motion and pacing style as @video1.

CAMERA MOTION (mirroring the path of @video1):
[0-1.2s] Open on a close-up of her left hand holding the small red-stringed charm pouch (omamori) at her waist. Soft warm light on the red fabric, goldfish drifting slowly past in the soft background blur.
[1.2-2.5s] The camera slowly drifts upward along her body — past her green striped yukata, settling onto her face. More goldfish become visible, gently swimming through the air around her, pink petals drifting across the frame.
[2.5-3.5s] At her face, the camera gently lingers — her purple eye meeting the viewer softly, the goldfish continuing their slow parallax behind her. The camera makes a subtle floating motion as if caught in the same dream.
[3.5-4.2s] The camera slowly pulls back to reveal the full composition — her standing beneath the temple roof eaves, goldfish suspended in the air around her, petals drifting.
[4.2-5s] CARD REVEAL CLIMAX — multiple goldfish swim into view from the edges joining the dream, soft bloom intensifies around the frame.

05

Calling for a transition without giving the model a bridge

“Dissolve into haze and she appears on stage” gets interpreted as a hard cut every time.

▸ FIX

Give the transition a physical object the model can animate — a mirror that becomes a portal, a curtain that opens, a falling petal that crosses the frame.


▸ WRITING HABITS

Writing habits that produce clean output

After enough generations, patterns emerge.

01

Write like a director, not a novelist

Short declarative sentences. One fact per sentence. “Hair flutters slightly. Fixed shot. Eyes blink slowly.” v4.0 Preview reads this as instructions, not prose.

▸ EXAMPLE

02

Know which timestamps you’re writing

[Scene 1: 0-3s] framing tells v4.0 Preview to cut between shots. [0-1.2s] beat structure tells it to flow within one continuous take. They look identical on the page. Mixing them up is the #1 reason multi-shot prompts produce slideshows.

▸ EXAMPLE

03

Voice lines have a format

Three parts: voice description — emotional tone — line in native language.

Voice line — shy prince vocal type, warm and slightly trembling: 「あの……お昼、一緒に行かない?」

Never romanize. v4.0 Preview handles native characters far better than transliteration.

▸ EXAMPLE

04

SFX cues read like film sound design

Em dashes between sound elements:

SFX: heavy rain — thunder rumbling — silence — a single heartbeat

A paragraph describing the soundscape works far worse.

▸ EXAMPLE

05

Comic-strip mode is a real output mode

Ask for it explicitly:

From left to right, top to bottom, present this as a comic strip. Add special sound effects for scene transitions.

This switches v4.0 Preview into multi-panel manga output — animated panels with transitions, used for animated manga pages and doujin promo clips.

▸ EXAMPLE

▸ SYNTHESIS

Putting it all together

A complete v4.0 Preview generation might look like this:

▸ REFERENCES
  • @image1 — character reference (clean portrait, any pose)
  • @image2 — empty stage scene
  • @video1 — animation reference for camera style
▸ PROMPT

“@image1 character stands at center stage on @image2, looking forward with quiet confidence. The camera slowly pushes in toward her face, then a few cherry petals drift up around her. Follow the slow dreamy camera pacing of @video1.”

Three references, four lines of prompt, one cinematic video.

The references do most of the heavy lifting. The prompt directs the motion. The iteration loop catches the failures.

That’s the v4.0 Preview workflow when it clicks.

PART THREE

Production workflows
— Mio.2 pre-production + Edit Pro manga

When to use Mio.2 vs write the prompt yourself

▸ USE MIO.2 WHEN

You have a character and a vague idea. The Mio.2 AI agent handles the 6-stage pre-production work — pitches, script, shot list, reference image generation — and outputs a finished v4.0 Preview prompt.

▸ WRITE YOURSELF WHEN

You already have a clear shot in your head (Situation 1 above). Skip pre-production and go straight to a clean v4.0 Preview prompt with 1–2 reference images.

Most production work falls into the first bucket. That’s why Workflow 1 exists — Mio.2 is the difference between “I have a character I love” and “I have a 15-second anime video to publish.”

WORKFLOW ▸ 01

AI video pre-production with Mio.2 — 6 stages

From character + vague idea to a 15-second short.

Use this when you have a character, an idea, and 15 seconds to fill. Six stages, all in one Mio.2 conversation so the agent keeps context across them.

The running example below is real — a yuri-tragedy short called “The Heart She Can’t Spend.” Character: a silver-haired catgirl gambler with an eyepatch.

— THE FINAL DELIVERABLE —

“The Heart She Can’t Spend” — the 15-second short the 6-stage workflow produces.

STAGE 01

Story pitches

Upload your character image and run:

▸ USER PROMPT
Based on the character in this image, suggest 5 short-form anime story pitches in the [genre/vibe] direction. For each pitch, include a one-line logline, the opening shot or first line, and what emotional reaction it's going for.

Why this works: Replace [genre/vibe] with what you want — “jirai-yuri tragedy”, “isekai villainess comedy”, “casino gambling thriller”, “slice-of-life with one twist”. Leaving it blank produces generic pitches. Naming a direction produces ones you can actually use.

▸ MIO.2 OUTPUT ▸ FIVE PITCHES
PixAI Mio.2 five story pitches output

STAGE 02

Tighten the pitch

Pick one, then push back specifically:

▸ USER PROMPT
Develop pitch #[N]. Change: [what to adjust]. Keep: [what to preserve]. Rewrite the pitch with these changes, keeping the same emotional core.

Why this works: The change/keep structure stops the AI from rewriting everything. Without it, the AI sometimes silently overwrites the previous pitch in your conversation memory.

▸ MIO.2 OUTPUT ▸ TIGHTENED PITCH
PixAI Mio.2 tightened pitch output

STAGE 03

Outline, then cut

First get the full version:

▸ USER PROMPT
Outline this pitch as a [15-second] anime short with a hook (first 2s), build (story beats with rough timing), and final image. Include the actual dialogue lines. The premise should be conveyed indirectly through visuals and subtext — never stated out loud.

Why this works: Replace [duration] with your target length. The “never stated out loud” line is the most important constraint. AI defaults to over-explaining when compressing, and this reinforces that dialogue carries subtext, not exposition.

▸ OUTLINE — PART 1
PixAI Mio.2 outline part 1
▸ OUTLINE — PART 2
PixAI Mio.2 outline part 2

STAGE 04

Build the shot list

▸ USER PROMPT
Break this script into a shot list with a maximum of [number] shots. For each shot, specify timing, frame composition, what's in the frame, camera movement, key visual detail, and dialogue or on-screen text. If you produce more than [number] shots, compress to exactly [number] — each shot must do irreplaceable work.

Why this works: Replace [number] with your reference image budget. v4.0 Preview accepts up to 6 reference images, but 3–4 produces stronger results than 6 competing for attention. Giving the AI room to over-produce and then cutting yields better selection than asking for 4 upfront.

▸ MIO.2 OUTPUT ▸ SHOT LIST
PixAI Mio.2 shot list output

STAGE 05

Generate reference images

Send this to Mio.2 (or your image agent):

▸ USER PROMPT
Based on the shot list below, generate [N] storyboard images for video generation. Use the attached character reference image(s) to keep the character consistent across all shots. Do not include any text in the images. Match the art style of the reference.

[Paste your Stage 4 shot list here.]

Why this works: Attach your character reference image(s) — Mio.2 uses them as identity anchors. Match-art-style and no-text-in-images are the constraints that keep storyboard images usable as v4.0 reference inputs.

▸ FIXING FIRST-PASS ERRORS

The first pass will rarely be perfect. Common issues: wrong hair length, drifted eye color, an extra unwanted character in a solo shot, signature accessories on the wrong character. Each is fixable in one turn — name the problem precisely:

“Regenerate Shot 3 with only the gambler in frame, no second character. Confirm her eyes are vivid purple. Also fix Shot 2 — the sailor catgirl should not have an eyepatch.”

▸ REFERENCE IMAGE — SHOT A
PixAI reference image shot A
▸ REFERENCE IMAGE — SHOT B
PixAI reference image shot B

STAGE 06

Have Mio.2 write the v4.0 prompt

This is the payoff:

▸ USER PROMPT
Based on the script and shot list from earlier in this conversation, write a v4.0 Preview video prompt for the [N] reference images you just generated. For each timestamped beat, include: time range, which @image to reference, action and camera description, dialogue (with the exact lines from the script), and SFX cues. Use em dashes for layered SFX. Keep total prompt length under 2000 characters.

Example output format:
0s–Xs: @Image 1 [Action description with camera movement]. [Character]: "[Exact line from script.]" SFX: [sound 1] — [sound 2] — [sound 3]
Xs–Ys: @Image 2 [Action description]. [Character]: "[Exact line.]" SFX: [layered sounds with em dashes]
[Continue for each beat. End with fade-to-black instruction if applicable.]

Why this works: The “exact lines from the script” instruction matters. AI sometimes paraphrases dialogue when generating prompts, which drops the carefully-tuned subtext from Stage 3.

Mio.2 outputs the prompt. Copy it, paste into v4.0 Preview with your 4 reference images attached, generate. The final prompt for our test piece:

▸ THE FINAL V4.0 PREVIEW PROMPT
0s–2s: The white-haired catgirl gambler spins a gold coin between her fingers, smirking confidently. Gold coins float around her in dramatic lighting. SFX: metallic spin — tense atmosphere

2s–4s: Cut to sailor catgirl across the table, her hand trembling, heart pendant catching light. She whispers: "You said you'd find me again. Even if it took everything." SFX: soft tearful voice

4s–6s: Back to gambler, leaning in with delighted grin: "Cute line. Did you rehearse it?" Cards slap down. SFX: playful tone — card impact

6s–10s: Match cut — the coin falls through memory space. Two catgirls on a cliff, heart pendant being clasped, promise whispered. Everything dissolves into golden dust as coin lands. SFX: warm wind — ethereal chime — distant promise

10s–13s: Gambler blinks, something behind her eyes going quiet. She looks up, studying the crying girl with genuine curious smile: "...sorry, kitten. Do I know you?" SFX: silence — single heartbeat

13s–15s: Sailor catgirl's hand closes around her heart pendant. She smiles through tears, soft: "Not yet. Deal again?" Fade to black. Coins fall. SFX: emotional catch — coin clatter — silence

Four images, six beats, three lines of dialogue, layered SFX. 15 seconds of actual story — built across six stages, not in one prompt.

▸ FINAL SCENE REFERENCE
PixAI v4.0 final scene reference

WORKFLOW ▸ 02

Animated manga with Edit Pro + v4.0 Preview

For doujin promo clips and animated manga pages.

Use this when you want a multi-panel manga page that comes to life — animated reactions, comedic timing, the doujin-promo-clip feel. Edit Pro handles the manga layout, v4.0 Preview brings the panels to life.

▸ BASE IMAGE — MANGA PAGE
PixAI Edit Pro manga base image
▸ OUTPUT ▸ V4.0 PREVIEW

Edit Pro handles the layout: arrange panels, frame each beat, control reading order. Export the manga page as a single image. Hand it to v4.0 Preview with this prompt:

▸ V4.0 PREVIEW PROMPT
Present the comic story from top to bottom in a [tone] style, with smooth storytelling and expressive character reactions. Add adorable anime-style sound effects throughout the scenes, such as "boing," "bam," "wah," and "sparkle," to enhance the atmosphere and make the comic feel lively and dynamic.

Swap [tone] for “cute and humorous”, “elegant”, or “dramatic” depending on your story. We’ve tested it from 2-panel gag strips up to 6-panel sequences. The same template scales.

▸ FAQ

Common questions

How many reference images can I attach to a single v4.0 Preview prompt?

Up to 6 images. In our testing, 2–3 well-chosen references consistently outperformed 6 references competing for the model’s attention. Every extra image adds another vote on what the character should look like, and the model averages them.

What’s the difference between reference image and start frame in v4.0 Preview?

Reference images are semantic — v4.0 uses them to learn who the character is, then lets your prompt drive the pose and action. Start frame (in Keyframe mode) locks the first frame of the video. Keyframe mode and Multi-Reference mode are mutually exclusive — pick the one that fits your workflow. For the full breakdown of every model and panel setting, see our PixAI Image-to-Video Tutorial.

Can v4.0 Preview generate dialogue in Japanese or Korean?

Yes. Write the dialogue in the actual target language using native characters — Japanese in 「」, Korean in “” — never romanize. Romanized lines like “Konnichiwa” produce unintelligible audio in most generations.

How long can a v4.0 Preview video be?

Up to 15 seconds per generation. Reference video uploads have the same 15-second total ceiling — if you attach multiple video references, their combined length still has to fit under 15s.

Should I use v4.0 Preview or v4.0 Lite Preview for prompt iteration?

Use v4.0 Lite Preview or v2.7 High Dynamics for drafts and structural testing. They’re fast and good enough to expose problems in your prompt — vague action, bad transitions, missing references. Save v4.0 Preview for the final run once your prompt is working.

▸ KEEP READING

Related PixAI guides

▸ FINAL CUT
CTA ▸ 002

— READY? —

Make your first v4.0 Preview video

The hardest part isn’t the prompt syntax — it’s knowing what to set up before you write a single word. Pick a clean character reference. Decide whether you’re in Situation 1 or Situation 2. Run the iteration loop. Let Mio.2 handle the pre-production work where it makes sense.

Start with the simplest version of your idea. The capability tour and the model-by-model breakdown live in the companion guides below.

— Daily free credits · No credit card required —

Index