排程任務說它成功了,但模型已經不是原本那個

排程任務說它成功了,但模型已經不是原本那個

超商咖啡機外殼沒換、按鈕沒換,但機器裡悄悄換了豆子——出來的還是一杯熱的,點單也沒報錯,直到你嘗了第一口才知道。排程任務的靜默失敗就是這樣:回報「成功完成、今天沒有新資料」,但真相是模型在啟動前就被偷換了。

升級後的異常平靜

平台升級後,每天跑的排程任務突然安靜下來。不是失敗,不是報錯,是那種最讓人不安的「一切正常」。任務按時執行,log 顯示成功,回報訊息是「今天沒有新資料需要處理」。連續三天都是這樣。

問題是前一週明明每天都有資料進來。我回頭翻了資料來源,檔案都在。檔案格式沒變,API 端點沒變,權限沒變。唯一變的是平台升級引入了一個 model switch 機制——在 session 啟動前,系統會自動把 cron job 指定的模型換成預設的輕量模型。

輕量模型處理不了複雜工具呼叫。讀檔案需要呼叫特定 function,需要解析 JSON 結構,需要處理嵌套的參數。輕量模型試了一次,失敗了,然後回報「沒有新資料」。這不是謊報,是它真的以為自己做完了該做的事。

log 裡的微小差異

我拿升級前後的 log 做比對。差異只有一行:

session_start: model=gpt-4-0613

變成了:

session_start: model=gpt-3.5-turbo

中間沒有任何警告,沒有 deprecation notice,沒有「模型已切換」的提示。平台文件裡埋了一句「為優化成本,系統將自動選擇合適的模型執行任務」。合適的定義是什麼?文件沒說。什麼情況下會觸發?也沒說。

我回去翻升級說明,找到一段:「新增智能模型調度功能,根據任務複雜度自動選擇最佳模型」。聽起來很聰明,但判斷標準是黑箱。cron job 在啟動時沒有 prompt,沒有 context,系統看到的只是一個空 session,於是預設用最便宜的模型。

成功的假象

最糟的不是失敗,是假性成功。任務回傳 200 狀態碼,執行時間正常,沒有 exception,沒有 timeout。監控系統看到的是一片綠燈。如果不是我注意到產出的資料量異常,這個問題可能會靜靜躺好幾週。

全台便利商店密度全球第二,每 2,300 人一家。深夜蹲在門口座位區修 code 的工程師都知道那台咖啡機的味道。這次的問題就像那台機器:24 小時運轉,從不報錯,但你得自己嘗過才知道今天的豆子換了。

修復方法很簡單:在 cron 設定裡明確指定模型,加上參數鎖定,確保 session 啟動時不會被覆寫。但這件事提醒我,當系統說「我幫你優化了」,最好先問一句:優化的指標是什麼?犧牲的又是什麼?

— 邱柏宇

延伸閱讀


The Cron Job Said It Succeeded, But the Model Was Swapped

The convenience store coffee machine looks the same—same buttons, same exterior—but someone quietly swapped the beans inside. It still dispenses a hot cup, doesn’t throw an error, until you take that first sip. Silent failures in cron jobs work the same way: they report “success, no new data today,” but the truth is the model was swapped before the task even started.

The Suspicious Calm After Upgrade

After a platform upgrade, the daily scheduled task went quiet. Not failed, not errored—just that unsettling kind of “everything’s normal.” The job ran on schedule, logs showed success, and the message was “no new data to process today.” Three days in a row.

But the previous week had data coming in daily. I checked the data source—files were there. File format unchanged, API endpoint unchanged, permissions unchanged. The only thing that changed was the platform upgrade introducing a model switch mechanism. Before session startup, the system automatically replaced the cron-specified model with the default lightweight model.

The lightweight model couldn’t handle complex tool calls. Reading files required specific function calls, JSON parsing, handling nested parameters. The lightweight model tried once, failed, then reported “no new data.” Not a lie—it genuinely thought it finished its job.

Tiny Differences in the Logs

I compared logs before and after the upgrade. The difference was one line:

session_start: model=gpt-4-0613

became:

session_start: model=gpt-3.5-turbo

No warnings in between. No deprecation notice. No “model switched” alert. The platform documentation buried one sentence: “To optimize costs, the system will automatically select appropriate models for tasks.” What defines appropriate? Not documented. When does it trigger? Also not documented.

I dug into the upgrade notes and found: “New intelligent model scheduling feature automatically selects optimal model based on task complexity.” Sounds smart, but the criteria were a black box. At startup, the cron job had no prompt, no context—the system saw an empty session and defaulted to the cheapest model.

The Illusion of Success

Worse than failure is false success. The task returned a 200 status code, execution time normal, no exceptions, no timeouts. Monitoring showed all green. If I hadn’t noticed the anomalous data output volume, this issue could have sat quietly for weeks.

Taiwan has the world’s second-highest convenience store density—one store per 2,300 people. Engineers debugging code late night at those outdoor seats know the taste of that coffee machine. This problem was like that machine: running 24/7, never reporting errors, but you have to taste it yourself to know the beans changed today.

The fix was simple: explicitly specify the model in cron configuration, add parameter locks to prevent session startup overrides. But this reminded me—when a system says “I optimized this for you,” better ask first: what’s the optimization metric? What’s being sacrificed?

— 邱柏宇

Related Posts