把廚房煙霧偵測器的電池拔掉,因為每次煮泡麵它都叫太煩了。哪天真的有問題,廚房安靜得一點聲音都沒有。這是一個維護腳本的故事,結構幾乎一模一樣。
現象
一個每天自動執行的維護腳本,流程是這樣的:先清掉本地的殘留修改,再拉取上游最新版本,最後切換到指定的 tag。清理那一行加了 2>/dev/null,理由合理——正常情況下沒有任何殘留,沒必要每次都輸出「沒有變更」。
那天有東西在。清理指令執行了,出了錯,/dev/null 把錯誤吃掉。腳本繼續跑,以為清理完成。接著切換版本的那行拒絕了:本地有未提交的修改,無法覆蓋。
困惑的地方在於,清理步驟確實沒有任何輸出。沒有抱怨,沒有警告,一片安靜。從輸出看不出任何異常。
分界點
同一次失敗裡有第二個問題同時發生:上游把一個分支從單層命名改成了命名空間結構,導致本地 fetch 時 refs 鎖定衝突。手動修復的路徑是 git update-ref -d 刪掉衝突的 ref,再清 packed-refs 裡的對應條目,之後才能重新 fetch 成功。這個錯誤同樣被 2>/dev/null 吞掉了。
兩個獨立的錯誤,同一個靜音機制,最後組合出一個難以看清來源的失敗。任何一個錯誤單獨發生,可能都更容易定位。兩個疊在一起,加上完全沒有輸出,排查的起點就變得很奇怪——最後一步失敗,但問題不在最後一步。
容易誤判的原因
腳本設計上的靜音是刻意的,且在過去一直是對的。錯誤不出現在「加了 2>/dev/null」這個決定的當下,而是出現在「假設這個指令永遠不會失敗」這個隱性前提上。加靜音的人和後來維護腳本的人,腦子裡的模型是一樣的:清理步驟是例行的、邊界情況很少。沒有人預期清理本身會失敗。
所以排查的直覺會先落在切換版本那一步——它是唯一發出聲音的地方。但那只是最後被卡住的地方,不是出錯的地方。
確認方式
把 2>/dev/null 暫時拿掉,單獨執行清理那行,看它吐出什麼。如果指令成功,輸出是空的;如果失敗,這時候才看得到。這個 check 很直接,但在腳本自動化之後,幾乎沒有人會主動做這件事——因為它一直以來都安靜地跑完了。
refs 衝突那側的確認方式是 git fetch --dry-run 或直接看 .git/packed-refs,確認有沒有命名衝突的條目。上游改了命名結構,本地沒有對應清理,這種衝突不會自己消失。
留給未來的話
靜音一個指令,等於靜音這個指令未來所有可能的失敗模式。在寫的當下,腦子裡的失敗模式通常只有一種:沒有東西要清,輸出「沒有變更」這種雜訊。但指令的失敗模式不只這一種。
加靜音之前值得問的問題不是「這個指令正常情況下會不會輸出雜訊」,而是「這個指令失敗的時候,我需不需要知道」。如果答案是需要,那至少要讓錯誤碼還能往外傳——2>/dev/null 吃掉的是 stderr,但指令的退出碼如果沒有被檢查,同樣是消失了。
— 邱柏宇
延伸閱讀
The Script Never Complained Because It Never Could
Imagine pulling the battery out of a kitchen smoke detector because it keeps going off while making instant noodles. The day something actually burns, the kitchen stays completely silent. This maintenance script story has almost the same structure.
What Happened
A daily automated script: discard local changes, fetch upstream, checkout the target tag. The cleanup step used 2>/dev/null — reasonable, since there’s normally nothing to clean, no point printing “no changes” every run.
That day, there was something. The cleanup command ran, failed, and /dev/null swallowed the error. The script continued, assuming cleanup succeeded. Then the checkout step refused: uncommitted local changes, cannot overwrite.
The disorienting part: the cleanup step produced zero output. No complaint, no warning, total silence. Nothing in the log pointed anywhere.
Two Errors, One Suppressor
A second problem was happening in parallel. Upstream had renamed a branch from a flat naming scheme to a namespace structure, causing a refs lock conflict during fetch. The manual fix involved git update-ref -d to remove the conflicting ref, then cleaning the corresponding entry in packed-refs before fetch could succeed. That error was also swallowed by 2>/dev/null.
Two independent failures, one suppression mechanism, one failure that was nearly impossible to read. Either error alone would have been easier to locate. Combined, with no output from either, the starting point for debugging became strange — the last step failed, but the problem wasn’t in the last step.
Why It Looked Wrong
The suppression was intentional and had always been correct before. The error didn’t live in the decision to add 2>/dev/null — it lived in the implicit assumption that the cleanup command would never itself fail. The person who added the silence and whoever maintained the script later shared the same mental model: cleanup is routine, edge cases are rare.
So the instinct during debugging was to look at the checkout step — the only place that made noise. That was just where things got stuck, not where things went wrong.
The Specific Check
Remove 2>/dev/null temporarily, run the cleanup line alone, see what it produces. If it succeeds, output is empty. If it fails, now you can see it. Direct and obvious — but after years of the script running quietly, nobody thinks to test this. It always finished silently before.
One Thing Worth Noting
Silencing a command means silencing every failure mode that command might ever have. When you write the suppression, the failure mode in your head is usually just one: “no changes” as noise. But that’s not the only way a command can fail.
The right question before adding silence isn’t “will this command produce noise under normal conditions” — it’s “do I need to know when this command fails.” If the answer is yes, at minimum let the exit code propagate. 2>/dev/null eats stderr, but if nobody checks the exit code either, the failure disappears just as completely.
— 邱柏宇
Related Posts
https://justfly.idv.tw/s/N31GAKw