DiT Model Prompt Writing Guide

This guide shares practical tips for writing better prompts on PixAI, focusing on SDXL and DiT-based models. It covers prompt structures, useful tags, model recommendations, LoRA usage, and hands-on model reviews. It is a helpful resource for creators who want to improve image quality, style control, and generation results.

Editor’s note (PixAI)
This guide was written by one of our community’s standout creators, 阿童 (ATone), and is republished here with credit to the original author.


PixAI’s DiT family — Tsubaki / Serin / Tsubaki Flash — has a very different prompt style from the SDXL line. This chapter is for users who already know SDXL prompting and are picking up DiT.


Core Principle

DiT models only accept English prompts and strongly favor natural English descriptions — the closer it sounds to telling a story to a professional illustrator, the better.

Why not use Danbooru tags?

  • SDXL-family models (Illustrious, NoobAI, etc.) use a CLIP text encoder, which was trained on Danbooru / e621 tag-style captions and therefore expects tag-style input.
  • DiT models use a text encoder closer to a modern LLM. They understand natural-language descriptions much better and adapt less well to a flat tag list.
  • The upshot: SDXL’s tag-existence rule (young man is invalid; you must write 1boydoes not apply in DiT. Just write normal English.

Empirical comparison: Model × Prompt style

The same prompt sent to different models gives very different results. The 2×2 below uses the PixAI mascot Mio LoRA (DiT + SDXL versions of the same character, Spring Echoes / Emerald Melody variant) under a strict controlled comparison — same scene, only the model / prompt style swapped:


Natural-language prompt

Tag-stack prompt

Tsubaki.2 (DiT)

A: Model strength + matching prompt style ✓
B: DiT adapts poorly to tag-style prompts

Illustrious-XL (SDXL)

C: SDXL adapts poorly to natural language
D: Model strength + matching prompt style ✓

The diagonal A → D pairs match each model’s preferred prompt style; the off-diagonal B → C pairs produce off-target outputs — even with the same LoRA and the same scene, a mismatched model / prompt-style pairing can throw the result off.


SDXL → DiT: Common Migration Pitfalls

When jumping from SDXL to DiT, drop these habits:

❌ SDXL habitWhy it fails in DiT / What to do
1boy, solo, masterpiece, best qualityDiT doesn’t lean on quality tags. Rewrite as a natural sentence: “A young man standing alone in a cinematic scene.”
Heavy quality stacks (8k, ultra-detailed, extremely detailed)DiT image quality is already strong; piling on quality tags can produce results that don’t match what you intended (sometimes diluted, sometimes overshot). Keep at most one style word.
Underscore tokens (black_hairlooking_at_viewer)DiT reads natural English. Drop the underscores.
Bracket weighting (black hair:1.2)DiT doesn’t recognize this syntax. To emphasize an element, rewrite the sentence and put it earlier.
right: ... left: ... blocks or BREAK for multi-character isolationThese still work on DiT, but the effect isn’t pronounced. Switching to described relationships and interactions usually gives a livelier composition (see the multi-character section below).

Generation Parameters: What’s Different

Beyond the prompt itself, Tsubaki.2 also expose a different parameter panel from SDXL:

  • No CFG Scale and no step count. The two knobs you tune most on SDXL simply aren’t on the Tsubaki.2 panel.
  • Use the “Mode” selector instead to balance quality vs. speed. The options are Lite / Standard / Pro / Ultra (Chinese: 輕量 / 標準 / 專業 / 極致). The underlying mechanism is close to step count — higher tiers give finer detail at higher credit cost.
  • “Standard” is already a strong default; reserve “Pro” for cases that genuinely need extreme detail.
Mode selector UI

Scenario 1: Single Character

Recommended writing order:

OrderContentWhy this order
1Style / overall mood / camera languageSets the global tone first; everything below aligns with it
2Subject + action / poseEstablishes the focal point
3Outfit & accessoriesDetail the subject after positioning
4Foreground propsRound out the focal area
5Background environmentFrom near to far
6Lighting & effectsFinal pass that locks in atmosphere

Example:

A cinematic medium shot of a young Taiwanese girl with long silver hair and purple eyes, gently smiling, wearing an elegant white lolita dress with intricate lace, standing in a cherry blossom garden, soft pink petals floating in the air, warm golden hour sunlight filtering through the trees, highly detailed, beautiful anime style

💡 Notice the phrase young Taiwanese girl — that’s an invalid Danbooru tag in SDXL and CLIP would mishandle it, but it’s perfectly fine natural English in DiT. DiT does not require tag database lookups.


Scenario 2: Multiple Characters

The biggest change in DiT for multi-character scenes — describe relationships instead of isolating with tags.

Recommended writing order:

OrderContentWhy this order
1Overall composition / camera / moodSame as single character — set the tone
2Relationships and interactions between characters (most important!)This is how DiT figures out who is who and who is doing what to whom
3Each character’s appearance, action, expression (primary → secondary)Introduce them one by one in priority order
4Outfits and detailsAfter the cast is clear
5Background, lighting, effectsFinal pass, same as above

Example:

A romantic wide shot under cherry blossoms at sunset, a silver-haired catgirl with purple eyes is tiptoeing to kiss a tall black-haired boy, the boy gently holding her waist, they are looking at each other affectionately, detailed intricate clothing, soft pink petals floating around them, warm golden sunlight, cinematic lighting, emotional atmosphere, beautiful detailed anime style

⚠️ SDXL multi-character tricks are not necessary — relationships description like “she is tiptoeing to kiss him while he holds her waist” usually works better.

General Tips

Embedding LoRA triggers (suggestion, not yet fully validated)

A common community conjecture: writing the LoRA trigger as part of the natural-language description may be more stable than a tag-style prefix, because it makes the relationship between the trigger and the described subject more explicit to the model. This isn’t fully validated, and behavior can vary by LoRA / scenario — try both styles and see which works better in your case.

Worth noting: some PixAI official DiT LoRAs (such as the mascot Mio LoRA) ship a trigger that is itself a full descriptive sentence, designed to be folded directly into your prompt. For example, the [PixAI Mio/ミオ] Spring Echoes LoRA trigger:

A girl with white-to-pink gradient hair, heart ahoge, purple eyes, eyepatch, cat ears, fang, jirai kei style. Open dark grey glossy leather hoodie over a black bandeau, slight cleavage, cinched waist, pink drawstrings. Black distressed low-rise denim short

Letting it flow directly into the scene action reads more naturally than dropping it as a prefix and starting a separate sentence:

StyleExample
Whole trigger as a prefix, then a separate scene sentence<full trigger>. She is walking through neon-lit Shibuya at night.
Naturally combined (recommended)A girl with white-to-pink gradient hair, heart ahoge, purple eyes, eyepatch, cat ears, fang, jirai kei style, walking through neon-lit Shibuya at night, ...

If you can’t fit it naturally, drop it as a single sentence at the start or end.

Negative prompt (shared baseline)

blurry, low quality, deformed hands, extra fingers, bad anatomy, watermark, text, logo, ugly, deformed, mutated

DiT honors negative prompts the same way SDXL does. This baseline list works for both.

Put style descriptions in Customize Style

⚠️ Customize Style is Tsubaki.2-only. Other DiT models (Tsubaki v1, Serin, Tsubaki Flash) don’t have this field. On Tsubaki.2, peeling style words out into Customize Style keeps your main prompt clean. On other DiT models, fold the style words into the tail of the main prompt.

Customize Style examples

SceneCustomize Style content
Single-character portraitdelicate anime style, soft lighting, studio ghibli influence
Romantic multi-characterromantic anime style, cinematic, soft bokeh
Index