DiT Model Prompt Writing Guide
This guide shares practical tips for writing better prompts on PixAI, focusing on SDXL and DiT-based models. It covers prompt structures, useful tags, model recommendations, LoRA usage, and hands-on model reviews. It is a helpful resource for creators who want to improve image quality, style control, and generation results.
Editor’s note (PixAI)
This guide was written by one of our community’s standout creators, 阿童 (ATone), and is republished here with credit to the original author.
PixAI’s DiT family — Tsubaki / Serin / Tsubaki Flash — has a very different prompt style from the SDXL line. This chapter is for users who already know SDXL prompting and are picking up DiT.
Table of Contents
Core Principle
DiT models only accept English prompts and strongly favor natural English descriptions — the closer it sounds to telling a story to a professional illustrator, the better.
Why not use Danbooru tags?
- SDXL-family models (Illustrious, NoobAI, etc.) use a CLIP text encoder, which was trained on Danbooru / e621 tag-style captions and therefore expects tag-style input.
- DiT models use a text encoder closer to a modern LLM. They understand natural-language descriptions much better and adapt less well to a flat tag list.
- The upshot: SDXL’s tag-existence rule (
young manis invalid; you must write1boy) does not apply in DiT. Just write normal English.
Empirical comparison: Model × Prompt style
The same prompt sent to different models gives very different results. The 2×2 below uses the PixAI mascot Mio LoRA (DiT + SDXL versions of the same character, Spring Echoes / Emerald Melody variant) under a strict controlled comparison — same scene, only the model / prompt style swapped:
Natural-language prompt
Tag-stack prompt
Tsubaki.2 (DiT)
Illustrious-XL (SDXL)
The diagonal A → D pairs match each model’s preferred prompt style; the off-diagonal B → C pairs produce off-target outputs — even with the same LoRA and the same scene, a mismatched model / prompt-style pairing can throw the result off.
SDXL → DiT: Common Migration Pitfalls
When jumping from SDXL to DiT, drop these habits:
| ❌ SDXL habit | Why it fails in DiT / What to do |
| 1boy, solo, masterpiece, best quality | DiT doesn’t lean on quality tags. Rewrite as a natural sentence: “A young man standing alone in a cinematic scene.” |
Heavy quality stacks (8k, ultra-detailed, extremely detailed) | DiT image quality is already strong; piling on quality tags can produce results that don’t match what you intended (sometimes diluted, sometimes overshot). Keep at most one style word. |
Underscore tokens (black_hair, looking_at_viewer) | DiT reads natural English. Drop the underscores. |
Bracket weighting (black hair:1.2) | DiT doesn’t recognize this syntax. To emphasize an element, rewrite the sentence and put it earlier. |
right: ... left: ... blocks or BREAK for multi-character isolation | These still work on DiT, but the effect isn’t pronounced. Switching to described relationships and interactions usually gives a livelier composition (see the multi-character section below). |
Generation Parameters: What’s Different
Beyond the prompt itself, Tsubaki.2 also expose a different parameter panel from SDXL:
- No CFG Scale and no step count. The two knobs you tune most on SDXL simply aren’t on the Tsubaki.2 panel.
- Use the “Mode” selector instead to balance quality vs. speed. The options are
Lite / Standard / Pro / Ultra(Chinese:輕量 / 標準 / 專業 / 極致). The underlying mechanism is close to step count — higher tiers give finer detail at higher credit cost. - “Standard” is already a strong default; reserve “Pro” for cases that genuinely need extreme detail.
Scenario 1: Single Character
Recommended writing order:
| Order | Content | Why this order |
| 1 | Style / overall mood / camera language | Sets the global tone first; everything below aligns with it |
| 2 | Subject + action / pose | Establishes the focal point |
| 3 | Outfit & accessories | Detail the subject after positioning |
| 4 | Foreground props | Round out the focal area |
| 5 | Background environment | From near to far |
| 6 | Lighting & effects | Final pass that locks in atmosphere |
Example:
A cinematic medium shot of a young Taiwanese girl with long silver hair and purple eyes, gently smiling, wearing an elegant white lolita dress with intricate lace, standing in a cherry blossom garden, soft pink petals floating in the air, warm golden hour sunlight filtering through the trees, highly detailed, beautiful anime style
💡 Notice the phrase
young Taiwanese girl— that’s an invalid Danbooru tag in SDXL and CLIP would mishandle it, but it’s perfectly fine natural English in DiT. DiT does not require tag database lookups.
Scenario 2: Multiple Characters
The biggest change in DiT for multi-character scenes — describe relationships instead of isolating with tags.
Recommended writing order:
| Order | Content | Why this order |
| 1 | Overall composition / camera / mood | Same as single character — set the tone |
| 2 | Relationships and interactions between characters (most important!) | This is how DiT figures out who is who and who is doing what to whom |
| 3 | Each character’s appearance, action, expression (primary → secondary) | Introduce them one by one in priority order |
| 4 | Outfits and details | After the cast is clear |
| 5 | Background, lighting, effects | Final pass, same as above |
Example:
A romantic wide shot under cherry blossoms at sunset, a silver-haired catgirl with purple eyes is tiptoeing to kiss a tall black-haired boy, the boy gently holding her waist, they are looking at each other affectionately, detailed intricate clothing, soft pink petals floating around them, warm golden sunlight, cinematic lighting, emotional atmosphere, beautiful detailed anime style
⚠️ SDXL multi-character tricks are not necessary — relationships description like “she is tiptoeing to kiss him while he holds her waist” usually works better.
General Tips
Embedding LoRA triggers (suggestion, not yet fully validated)
A common community conjecture: writing the LoRA trigger as part of the natural-language description may be more stable than a tag-style prefix, because it makes the relationship between the trigger and the described subject more explicit to the model. This isn’t fully validated, and behavior can vary by LoRA / scenario — try both styles and see which works better in your case.
Worth noting: some PixAI official DiT LoRAs (such as the mascot Mio LoRA) ship a trigger that is itself a full descriptive sentence, designed to be folded directly into your prompt. For example, the [PixAI Mio/ミオ] Spring Echoes LoRA trigger:
A girl with white-to-pink gradient hair, heart ahoge, purple eyes, eyepatch, cat ears, fang, jirai kei style. Open dark grey glossy leather hoodie over a black bandeau, slight cleavage, cinched waist, pink drawstrings. Black distressed low-rise denim short
Letting it flow directly into the scene action reads more naturally than dropping it as a prefix and starting a separate sentence:
| Style | Example |
|---|---|
| Whole trigger as a prefix, then a separate scene sentence | <full trigger>. She is walking through neon-lit Shibuya at night. |
| Naturally combined (recommended) | A girl with white-to-pink gradient hair, heart ahoge, purple eyes, eyepatch, cat ears, fang, jirai kei style, walking through neon-lit Shibuya at night, ... |
If you can’t fit it naturally, drop it as a single sentence at the start or end.
Negative prompt (shared baseline)
blurry, low quality, deformed hands, extra fingers, bad anatomy, watermark, text, logo, ugly, deformed, mutated
DiT honors negative prompts the same way SDXL does. This baseline list works for both.
Put style descriptions in Customize Style
⚠️ Customize Style is Tsubaki.2-only. Other DiT models (Tsubaki v1, Serin, Tsubaki Flash) don’t have this field. On Tsubaki.2, peeling style words out into Customize Style keeps your main prompt clean. On other DiT models, fold the style words into the tail of the main prompt.
Customize Style examples
| Scene | Customize Style content |
|---|---|
| Single-character portrait | delicate anime style, soft lighting, studio ghibli influence |
| Romantic multi-character | romantic anime style, cinematic, soft bokeh |
