切換到支援 thinking 模式的本地推理模型,第一個 API 請求回來了,content 欄位是空的。換個 prompt 試,還是空的。重啟服務,還是空的。
這就像台北早餐店的阿姨,你說「一份蛋餅」,她點頭、手沒停地在做——做完了,包好放在她那側的台子上。你在外頭等,窗口沒有東西出來,以為她忘了。其實她以為你要自己進來拿。蛋餅做好了,只是放在你看不到的地方。
問題不在格式,在行為
這個時候很自然地往 API 格式去查,或者懷疑模型本身壞了。但問題更微妙:這類推理模型預設開啟了思考模式,會把所有推理過程藏在 reasoning 欄位,content 欄位刻意留白。
OpenAI-compatible API 層並不會替你擋掉這個差異。它把原始回應照原樣傳過來,而你的程式只讀 content,所以什麼都看不到。格式是相容的,Schema 沒錯,JSON 解析正常。只是模型選擇在另一個房間說話。
解法有兩種:在呼叫時明確傳入關閉 thinking 的參數,或者切換到原生 API 去讀對應欄位。前者讓模型閉嘴直接給答案,後者讓你看到它怎麼想的。但你得先知道有這個選項。
相容層保證格式,不保證行為
這件事暴露了一個更普遍的問題:當我們說某個 API 是「OpenAI-compatible」,我們以為自己買到的是行為的相容,但實際上只有格式的相容。回應結構長得一樣,不代表模型的預設開關設定一樣。
新一代推理模型有自己的思考節奏。它們不是單純的文字接龍機器,而是會先在內部推理、然後才輸出結論的系統。這個「先想再說」的模式,在 API 層面表現為多出來的欄位、多出來的參數、多出來的行為預設值。
接進來之前要先搞清楚它在哪個模式下說話。不是讀文件裡的 schema,是讀文件裡的 defaults。那些看起來不起眼的布林值,決定了你的 content 欄位會不會有東西。
Debug 的時間花在假設上
這次 debug 花了二十分鐘。其中十五分鐘在檢查格式、重啟服務、換 prompt。真正的問題只用五分鐘就解決了,前提是你知道要看哪個欄位。
教訓是:當一個新模型接進來,不要只測它會不會動,要測它預設怎麼動。空回應不一定是錯誤,有時候只是模型把話說在你沒開的頻道上。
— 邱柏宇
延伸閱讀
The Reasoning Model Hides Answers in Fields You Don’t Read
Switched to a local reasoning model with thinking mode support. First API request came back. The content field was empty. Tried a different prompt. Still empty. Restarted the service. Still empty.
It’s like ordering a breakfast wrap at a Taipei morning shop — the lady nods, keeps working, wraps it up, and puts it on the counter on her side. You wait outside the window. Nothing appears. You think she forgot. She thinks you’ll come in and grab it yourself. The wrap is done. Just not where you’re looking.
The Problem Isn’t Format, It’s Behavior
Natural instinct: check the API format, suspect the model is broken. But the issue is subtler. These reasoning models default to thinking mode enabled. They dump the entire reasoning process into a reasoning field and deliberately leave content blank.
The OpenAI-compatible API layer doesn’t shield you from this difference. It passes the raw response through as-is. Your code only reads content, so you see nothing. The format is compatible. Schema is fine. JSON parses correctly. The model just chose to speak in another room.
Two solutions: explicitly pass a parameter to disable thinking mode when calling, or switch to the native API and read the correct field. The former makes the model shut up and give you the answer. The latter lets you see how it thinks. But you need to know the option exists first.
Compatibility Layers Guarantee Format, Not Behavior
This exposes a broader issue. When we say an API is “OpenAI-compatible,” we think we’re buying behavioral compatibility. What we actually get is structural compatibility. Response structure looks the same. Doesn’t mean the model’s default switches are set the same.
New-generation reasoning models have their own rhythm. They’re not simple text-completion machines. They reason internally first, then output conclusions. This “think then speak” mode manifests at the API level as extra fields, extra parameters, extra default values.
Before plugging one in, figure out which mode it speaks in by default. Don’t just read the schema in the docs. Read the defaults. Those innocuous-looking boolean values determine whether your content field will have anything in it.
Debug Time Spent on Assumptions
This debug took twenty minutes. Fifteen of those were spent checking formats, restarting services, swapping prompts. The actual fix took five minutes, provided you knew which field to look at.
Lesson: when a new model plugs in, don’t just test if it moves. Test how it moves by default. An empty response isn’t always an error. Sometimes the model is just talking on a channel you didn’t open.
— 邱柏宇