在便利商店的自助沖印機外面貼一張字條:「請印直式」。機器完全不看,照舊吐出橫式照片。格式要在觸控螢幕的設定選單裡選,字條只是貼在外面的紙,不是命令。
這跟我上週遇到的 API 參數問題一模一樣。
prompt 寫得再清楚也沒用
生成直式影像時,我在 prompt 前段明確寫上「9:16 vertical」。模型回傳的卻是 4:3 橫圖。沒有錯誤訊息,生成成功,比例就是不對。花了一輪排查才找到原因:某個 AI 影像生成 API 的輸出規格不由 prompt 文字決定,而是由 API 呼叫裡獨立的設定物件控制。
Prompt 告訴模型畫什麼,API 設定才告訴系統用什麼格式輸出。這是兩條平行但完全不交叉的命令通道。把比例參數從 prompt 移到正確的 generationConfig 欄位後,輸出立刻正確了。
反直覺的地方在於:你寫得越清楚,越容易以為已經說清楚了。但清楚寫在文字裡和「寫在正確的地方」根本是兩件事。
兩條不相交的指令
大多數 AI 模型的 API 都有這種雙軌設計。Prompt 是內容層的指令,告訴模型要生成什麼。API 參數是系統層的設定,控制輸出格式、token 數量、溫度係數、圖片解析度。兩者都用自然語言或 JSON 結構表達,看起來很像,實際上分屬不同處理管線。
問題是人的直覺不這樣運作。當我在 prompt 裡寫「9:16 vertical image」,我腦中預設的是「我已經告訴系統我要直式圖」。但系統不這樣讀。它把這串字當作「畫面構成的語意提示」,不是「輸出規格的技術指令」。真正的輸出規格要寫在 API 呼叫的另一個欄位,通常叫 config、settings 或 parameters。
這不是文件寫得不清楚的問題。文件裡通常都有寫。問題是當你在 prompt 裡已經寫了一遍,你不會再去檢查「是不是還有另一個地方也要寫」。你以為你說清楚了。
介面設計的盲點
這種雙軌設計有技術上的理由。內容生成和系統配置分開管理,架構比較乾淨,擴展性也好。但從使用者角度來看,這是把同一件事拆成兩個入口。你要記得哪些參數走 prompt,哪些參數走 config。記錯了,系統不會報錯,只會默默用預設值。
更糟的是,不同 API 的分界線不一樣。有些模型把圖片尺寸放在 prompt 裡就有效,有些必須放 config。有些把溫度係數視為內容生成的一部分,有些視為系統參數。沒有統一標準。你得逐一試過才知道。
我後來養成習慣:每次呼叫新 API,先假設 prompt 和 config 是完全獨立的兩個世界。Prompt 只寫「畫什麼」,config 只寫「怎麼輸出」。就算文件說可以混著寫,我也不混。減少一個出錯的可能性。
清楚不等於正確
這件事讓我想起一個更根本的問題:清楚表達和正確傳達是兩回事。你可以把意圖寫得非常清楚,但如果寫在系統不讀取的地方,再清楚也沒用。系統不會因為你寫得認真就主動幫你找到正確的欄位。它只會讀它被設計要讀的地方。
便利商店的沖印機不會讀外面的字條。API 不會讀 prompt 裡的格式指令。機器不在乎你的意圖,只在乎你的指令下在哪裡。
— 邱柏宇
延伸閱讀
You’re Talking to the Machine in the Wrong Place
Imagine taping a note on the outside of a convenience store photo kiosk: “Please print portrait.” The machine ignores it completely and prints landscape anyway. Format settings must be selected on the touchscreen menu. The note is just paper on the outside—not a command.
This is exactly what happened with an API parameter issue I hit last week.
Writing it clearly doesn’t mean it works
When generating a portrait image, I explicitly wrote “9:16 vertical” at the beginning of the prompt. The model returned a 4:3 landscape image. No error message, generation successful, ratio just wrong. After a round of debugging, I found the cause: the output specification of a certain AI image generation API is not determined by prompt text, but controlled by an independent configuration object in the API call.
The prompt tells the model what to draw. The API settings tell the system what format to output. These are two parallel but completely non-intersecting command channels. After moving the ratio parameter from the prompt to the correct generationConfig field, the output was immediately correct.
The counterintuitive part: the clearer you write, the easier it is to assume you’ve made yourself clear. But writing clearly in text and “writing in the right place” are two completely different things.
Two non-intersecting instructions
Most AI model APIs have this dual-track design. The prompt is a content-level instruction telling the model what to generate. API parameters are system-level settings controlling output format, token count, temperature coefficient, image resolution. Both are expressed in natural language or JSON structures and look similar, but they actually belong to different processing pipelines.
The problem is human intuition doesn’t work this way. When I write “9:16 vertical image” in the prompt, my mental model assumes “I’ve told the system I want a portrait image.” But the system doesn’t read it that way. It treats this string as “semantic hints for scene composition,” not “technical instructions for output specification.” The actual output specification must be written in another field of the API call, usually named config, settings, or parameters.
This isn’t about poor documentation. Documentation usually covers it. The problem is when you’ve already written it once in the prompt, you won’t check “is there another place I also need to write this?” You think you’ve made yourself clear.
Interface design blind spot
This dual-track design has technical reasons. Separating content generation from system configuration makes the architecture cleaner and more extensible. But from a user perspective, this splits one thing into two entry points. You have to remember which parameters go in the prompt and which go in config. Get it wrong and the system won’t report an error—it’ll just silently use default values.
Worse, different APIs draw the boundary differently. Some models work when you put image dimensions in the prompt. Others require config. Some treat temperature coefficient as part of content generation. Others as system parameters. No unified standard. You have to try each one to know.
I’ve since developed a habit: every time I call a new API, I assume prompt and config are two completely independent worlds. Prompt only describes “what to draw,” config only specifies “how to output.” Even if documentation says you can mix them, I don’t. Reduces one potential failure point.
Clear doesn’t equal correct
This incident reminds me of a more fundamental issue: clear expression and correct transmission are two different things. You can write your intent very clearly, but if you write it where the system doesn’t read, clarity doesn’t matter. The system won’t help you find the correct field just because you wrote earnestly. It only reads where it’s designed to read.
Convenience store kiosks don’t read notes taped outside. APIs don’t read format instructions in prompts. Machines don’t care about your intent—only where you placed your instructions.
— 邱柏宇