一個 Middleware 讓 47 個腳本同時掛掉

一個 Middleware 讓 47 個腳本同時掛掉

監控面板突然湧現數百條紅色警報。一個 API 端點加入了認證層,結果 47 個自動化腳本、12 個定時任務、6 個第三方服務同時失效。工程師面對的不是修復單一問題,而是要在最短時間內批量更新數十個分散在不同儲存庫的呼叫點。

就像你把家裡大門換了新鎖,但忘記通知所有拿舊鑰匙的人。

依賴關係永遠是隱形的

為既有 API 端點加入認證機制,本該是提升安全性的正常演進。問題是當這個端點已經被數十個服務串接時,沒有人能準確說出誰在用它。依賴關係的可見性永遠落後於實際部署速度。你以為只是改一個 middleware,其實是在拆一座用膠帶黏起來的大樓。

批量腳本不是選項,而是唯一可行的救援方案。但更根本的問題是:為什麼我們從來不知道這個端點被這麼多服務依賴?

記憶體會健忘

用記憶體快取實作去重機制,配合 5 分鐘 TTL,在開發環境運作良好。生產環境的現實是:進程會重啟、容器會漂移、負載會重新分配。一次計畫性維護後,去重機制完全失效,同一批資料被重複處理三次。

任何需要跨越進程生命週期的狀態,都必須持久化。file-based 或 DB-backed 的去重實作,配合至少 24 小時的 TTL,才能真正應對生產環境的波動性。記憶體的優勢是速度,但它的劣勢是健忘——在自動化系統中這是致命的。

這兩個問題看似分屬不同領域,但它們指向同一個核心:當系統跨越單一邊界運作時,我們對「正常運作」的假設會系統性地失效。真正成熟的自動化系統,不是消除這些邊界,而是明確承認它們的存在。

— 邱柏宇

Hundreds of red alerts flooded the monitoring dashboard. One API endpoint gained an authentication layer. Result: 47 automation scripts, 12 scheduled jobs, and 6 third-party integrations failed simultaneously. Engineers faced not a single fix, but a race to batch-update dozens of call sites scattered across different repositories.

Like changing your front door lock but forgetting to tell everyone who has the old key.

Dependencies Are Always Invisible

Adding authentication to existing API endpoints should be straightforward security enhancement. The problem is when that endpoint is already consumed by dozens of services, nobody can accurately say who’s using it. Dependency visibility always lags behind actual deployment velocity. You think you’re just changing one middleware, but you’re actually dismantling a building held together with duct tape.

Batch scripting isn’t an option—it’s the only viable rescue strategy. But the deeper question is: why didn’t we know this endpoint had so many dependents?

Memory Forgets

In-memory caching for deduplication with 5-minute TTL works beautifully in development. Production reality differs: processes restart, containers migrate, load rebalances. After one planned maintenance, the deduplication mechanism completely failed. The same batch of data got processed three times.

Any state that must survive process lifecycles requires persistence. File-based or DB-backed deduplication with at least 24-hour TTL can truly handle production volatility. Memory’s advantage is speed, but its weakness is forgetfulness—fatal in automation systems.

These two problems span different domains, yet they point to one core issue: when systems operate across boundaries, our assumptions about “normal operation” systematically fail. Mature automation systems don’t eliminate these boundaries—they explicitly acknowledge their existence.

— 邱柏宇