那個升級成功的日誌,其實是回滾之後的樣子

那個升級成功的日誌,其實是回滾之後的樣子

用外送 app 點餐,最後畫面跳出「訂單完成」——但那是因為店家靜默取消了,不是餐送到了。如果沒仔細翻訊息記錄,你只會覺得「好像處理好了」。這個感覺,和那天早上看到的那條日誌幾乎一模一樣。

日誌說成功,事情沒發生

自動升級腳本在半夜執行。套件倉庫短暫斷線,安裝過程遇到 timeout,npm 安靜地把新版本回滾到原本的版本,然後結束。腳本接著把結果寫進日誌:舊版本號 → 相同版本號。

這條紀錄和「已是最新版、不需要動作」的日誌長得一模一樣。沒有 ERROR,沒有警告,格式完全正確。

問題在隔天手動重試才被發現。安裝本來可以成功,那份日誌只是在記錄一次沒說實話的回滾。具體來說,日誌顯示的是 2026.4.22 -> 2026.4.22——npm 在 EIDLETIMEOUT 之後 fallback 回舊版,腳本把它 log 成「same → same」,外觀完全像「up to date」。

分界點在哪裡

腳本只比較版本號碼,沒有記錄安裝過程的 exit code。

這是關鍵的分界線。「版本相同」這四個字,在不同脈絡下意思完全不同:一種是「本來就一樣,不用動」,另一種是「試過了,失敗了,退回去了」。兩者在版本號比對上的輸出結果毫無差異,但一個是預期狀態,另一個是靜默失敗。腳本只看結果,沒有問過過程。

為什麼第一時間沒看出來

日誌格式正確這件事,本身就是最大的干擾。通常看日誌是在找 ERROR 或異常格式,正常外觀的紀錄會直接跳過。沒有人會去懷疑一條「看起來乾淨」的成功紀錄。

npm 的回滾行為也沒有大聲宣告自己。EIDLETIMEOUT 發生、安裝失敗、版本退回——這整個過程在 npm 的輸出裡屬於「我處理掉了」的範疇,不是「我失敗了」。腳本拿到的是 npm 退出後的版本號,不是安裝過程的 exit code,所以根本沒機會知道中間發生了什麼。

怎麼確認,以及留給下次的一件事

手動重試時,安裝一次就成功了,倉庫恢復正常、timeout 沒再出現。這確認了問題不是版本本身,是那次安裝過程的網路狀態,加上腳本沒有捕捉 exit code。

一個明確的 check:在任何版本比對邏輯之外,額外記錄安裝指令的 exit code,並在「版本相同」的情境下區分兩種原因——是「安裝前就已相同」還是「安裝後退回相同」。這兩條路的 exit code 不一樣,日誌應該反映這個差異。

留給下次的問題:靜默失敗最危險的地方,不是系統壞了,而是系統看起來沒壞。下次碰到「格式正確但什麼都沒變」的日誌,多問一句:這條紀錄是因為不需要做,還是因為做了但沒成功?

— 邱柏宇

延伸閱讀


The Successful Upgrade Log That Was Actually a Rollback

Picture ordering food on a delivery app and seeing “Order Complete” on the final screen — not because it arrived, but because the restaurant quietly cancelled. Without digging through the message log, you’d just assume everything worked out. That’s nearly identical to what one upgrade log looked like that morning.

The Log Said Success. Nothing Actually Happened.

An automated upgrade script ran overnight. The package registry went briefly offline, the installation hit a timeout, and npm quietly rolled back to the original version before exiting. The script then wrote the result to its log: old version number → same version number.

That entry looked identical to a “already up to date, no action needed” log. No ERROR, no warning, correct format throughout. The specific log read 2026.4.22 -> 2026.4.22 — npm had fallen back after an EIDLETIMEOUT, and the script recorded it as “same → same”, indistinguishable from a clean no-op.

The problem surfaced the next day during a manual retry. The installation succeeded immediately. That log had been recording a rollback that never admitted what it was.

Where the Boundary Is

The script compared version numbers. It never captured the installation’s exit code.

That’s the critical dividing line. “Version unchanged” means two entirely different things depending on context: either “it was already the same, nothing to do” or “it tried, failed, and retreated.” Both produce identical output in a version-comparison check. One is an expected state. The other is a silent failure dressed as one.

Why It Wasn’t Caught Immediately

A correctly-formatted log entry is its own camouflage. When scanning logs, attention goes to ERROR tags and anomalous structure. A clean-looking record gets skipped. Nobody interrogates a log that looks like success.

npm’s rollback behavior also doesn’t announce itself. EIDLETIMEOUT occurs, installation fails, version retreats — npm categorizes all of this as “handled,” not “failed.” The script received the post-exit version number, not the exit code from the installation process itself. It had no mechanism to detect what happened in between.

The One Check Worth Remembering

Manual retry the next day succeeded on the first attempt — registry was back, no timeout. That confirmed the problem wasn’t the version target, it was the network state during that run, combined with a script that never asked whether the installation actually completed.

The concrete fix: alongside any version comparison, log the exit code of the install command itself. When a “same → same” result appears, distinguish between two causes — was it already the same before the attempt, or did it end up the same after a failed one? The exit codes differ. The log should reflect that difference.

One thing worth carrying forward: the most dangerous silent failure isn’t a system that breaks. It’s a system that looks like it didn’t.

— 邱柏宇

Related Posts