
就像你把家裡貼好壁紙、裝好層架,房東說幫你重新粉刷——你回來後牆是白的,層架不見了,壁紙也沒了。房間還是你的房間,但裡面什麼都要重來。
台北的超商招牌一樣、地址一樣,但每天清早進去,前一天剩的東西全部不在了,架上是全新一批。Docker rebuild 就是這種感覺。外觀沒變,裡面已經是另一個世界。
Gateway 啟動,Slack 沉默
Gateway 服務順利起來了。port 沒變,container 名字沒變,volume 掛載點也沒變。Log 裡安靜地跑著啟動序列,沒有 error,沒有 warning。然後 Slack 頻道就一直沒有動靜。
不是超時,不是網路問題。去翻 gateway log,才看到那行字:plugin not installed: slack。
幾天前明明裝過。
分界點在 image rebuild 那一刻
Docker 的 container 是從 image 啟動的。image 重建之後,container 從那個新 image 跑起來——所有在上一個 container 裡手動裝進去的東西,包括 plugin 的 symlink、native module,全部不在了。那些改動從來沒有進入 image,只活在舊 container 的記憶裡。
這不是 bug,是設計。問題在於它不說話。Container 名字沒換,port 沒換,行為上跟之前一模一樣——直到你等一個 Slack 回應卻等不到。
分界點很精確:upgrade 腳本跑完、docker compose up -d --force-recreate 執行的那一刻。之前的狀態結束,之後的狀態開始,中間沒有警告。
為什麼沒立刻看出來
第一眼看到 Slack 無回應,直覺會往連線問題走:socket 斷了、token 過期、網路設定改了。這幾個方向都查不到東西。
Gateway 本身健康的。Health endpoint 回 200,進程在跑,服務沒掛。表面上一切正常,只是有一個 plugin 不存在。而「plugin 不存在」這件事,只有在 plugin 真的被呼叫到的時候才會浮出來——平常它靜靜地不在那裡,不報錯,不提醒。
Log 裡 plugin not installed: slack 是唯一的線索,而且它不是 error level,很容易略過。
確認方式
確認路徑很短:看 gateway log 裡有沒有 plugin not installed,然後手動重裝,看 Slack socket mode 有沒有連上。驗證結果:重裝之後,socket mode connected,9 個 plugins 全部載入。
修法也不複雜:在升級腳本的最後,自動重裝一次 plugins。每次 rebuild 之後都跑,不管上次有沒有裝過。
留給下次的話
Container rebuild 之後,所有「在 container 裡面裝的東西」都要重新來一次。這不是新知識,但它只有在踩到之後才會變成習慣。
下次碰到「服務起來了但某個功能沉默」,第一個問的問題應該是:這次有沒有 rebuild?如果有,什麼東西是跑在 container 裡、沒有進 image 的?
超商每天換一批貨,因為它知道前一天的東西要清掉。Docker rebuild 清掉的東西,不會主動告訴你。
— 邱柏宇
延伸閱讀
Rebuild Wiped Everything and Said Nothing
It’s like repainting a room. The landlord says he’ll freshen things up — you come back, the walls are white, the shelves are gone, the wallpaper too. Same room. Everything inside, reset to zero.
The gateway came back up cleanly. Same container name, same port, same volume mounts. The startup log ran through without a single error. Then the Slack channel just stayed quiet.
One Line in the Log
Not a timeout. Not a network issue. Pulling the gateway log revealed a single line: plugin not installed: slack.
The plugin had been installed days earlier. It worked. Then a Docker image rebuild happened — and the new container started fresh from that new image. Everything installed inside the previous container — symlinks, native modules, plugins — never made it into the image. It only ever lived in the old container’s runtime state.
The container came back looking identical. Same name, same port, same behavior on the surface. It just didn’t have the plugins anymore, and it didn’t say so.
The Exact Moment It Breaks
The split happens at docker compose up -d --force-recreate. Before that call: old container, old state, plugins present. After: new container from new image, clean slate. No warning between the two.
That’s not a bug in Docker. That’s how it works. The problem is the silence — there’s no diff, no “hey, these things that were installed before are now missing.” The service looks healthy. The health endpoint returns 200. Everything checks out until something actually tries to call the plugin.
Why It Wasn’t Obvious
Slack not responding first points toward connectivity: broken socket, expired token, changed network config. None of those led anywhere.
The gateway process was running fine. The plugin absence was invisible until the plugin got called. And plugin not installed in the log wasn’t thrown at error level — easy to miss in a clean-looking startup sequence.
The confirmation path was short: check the log for plugin not installed, reinstall, watch for Slack socket mode connected. After reinstall: socket mode connected, 9 plugins loaded.
One Thing for Next Time
The fix isn’t complicated — add an automatic plugin reinstall at the end of the upgrade script. Every rebuild triggers it, regardless of what was there before.
But the more useful reflex: when a service starts clean and one feature goes silent, the first question should be — was there a rebuild? And if so, what was installed inside the container that never made it into the image?
Convenience stores in Taiwan refresh their shelves completely every morning — yesterday’s stock cleared, today’s batch in. The store looks the same from outside. Docker works the same way, except it doesn’t put up a sign saying the shelves are empty.
— 邱柏宇
Related Posts
技術環境
- Container Runtime:Docker + Docker Compose
- 服務:OpenClaw Gateway
- 問題觸發:
docker compose up -d --force-recreate(image rebuild 後重建 container) - Plugin 安裝位置:container 內部 runtime state(非 image layer)
- 驗證工具:gateway log、Slack socket mode 連線狀態
- 修復結果:手動重裝後 socket mode connected,9 個 plugins 全部載入
錯誤傳染鏈時序圖
Upgrade Script 執行
│
▼
docker build → 產生新 image
│
▼
docker compose up -d --force-recreate
│
├─ [舊 container 停止] ← container-level plugins 在此消失
│
▼
新 container 從 new image 啟動
│
├─ port: 同 ✓
├─ volume mounts: 同 ✓
└─ plugins: ✗ (從未寫入 image)
│
▼
Gateway health endpoint 回 200(表面健康)
│
▼
Slack 頻道觸發 plugin 呼叫
│
▼
"plugin not installed: slack"(非 error level,容易略過)
│
▼
手動重裝 → socket mode connected → 9 plugins loaded ✓
修法前後 Code 對照
修法前(每次 rebuild 後需手動補裝,容易忘)
# upgrade.sh(舊) docker compose pull docker compose up -d --force-recreate # 結束——plugins 不見了,沉默失效
修法後(自動重裝,rebuild 後必跑)
# upgrade.sh(新) docker compose pull docker compose up -d --force-recreate echo "Reinstalling plugins after rebuild..." openclaw plugins install slack openclaw plugins install all echo "Plugin reinstall complete."
該被隔離的側效應類型
- 可觀測性盲點:health endpoint 回 200,無法從外部判斷 plugins 是否存在
- 靜默失效擴散:只有 plugin 被呼叫時才浮出,窗口期可能長達數小時
- 連線誤診風險:Slack 無回應優先指向連線問題,延誤根因定位
- Log level 遮蔽:
plugin not installed非 error level,常規掃描易略過 - Upgrade 腳本隱含假設失效:假設「上次裝過的東西還在」——rebuild 後此假設永遠為假
- 多 plugin 連鎖缺口:只確認 Slack,不代表其他 9 個 plugins 都在,需全量重裝
- Dockerfile 設計缺口:需持久化的 plugins 應寫進 Dockerfile,而非依賴 container 內手動操作
Technical Environment
- Container Runtime: Docker + Docker Compose
- Service: OpenClaw Gateway
- Trigger:
docker compose up -d --force-recreateafter image rebuild - Plugin install location: Container runtime state (not baked into image layer)
- Verification tools: Gateway log, Slack socket mode connection status
- Fix result: Socket mode connected post-reinstall; 9 plugins loaded
Error Propagation Timeline
Upgrade Script runs
│
▼
docker build → new image created
│
▼
docker compose up -d --force-recreate
│
├─ [Old container removed] ← all container-state plugins lost here
│
▼
New container starts from new image
│
├─ port: same ✓
├─ volume mounts: same ✓
└─ plugins: ✗ (never written into image)
│
▼
Gateway health endpoint returns 200 (looks healthy)
│
▼
Slack channel triggered → plugin call made
│
▼
"plugin not installed: slack" (non-error log, easy to miss)
│
▼
Manual reinstall → socket mode connected → 9 plugins loaded ✓
Before / After Code
Before (manual reinstall required after every rebuild — easy to forget)
# upgrade.sh (old) docker compose pull docker compose up -d --force-recreate # Done — plugins gone, silent failure
After (auto-reinstall baked into upgrade script)
# upgrade.sh (new) docker compose pull docker compose up -d --force-recreate echo "Reinstalling plugins after rebuild..." openclaw plugins install slack openclaw plugins install all echo "Plugin reinstall complete."
Side Effects to Isolate
- Observability blindspot: Health endpoint returns 200 — no external signal that plugins are missing
- Silent failure window: Plugin absence only surfaces when the plugin is called; gap can span hours
- Misdiagnosis vector: Slack silence triggers connectivity hypothesis first, delaying root cause identification
- Log level suppression:
plugin not installedis not at error level — easily overlooked in clean startup sequences - Upgrade script false assumption: Assumes “what was installed before is still there” — after rebuild, this is never true
- Cascading multi-plugin gap: Reinstalling Slack doesn’t guarantee the other 8 plugins survived; full reinstall required
- Dockerfile design gap: Persistent plugins should be installed in the Dockerfile, not applied manually inside a running container
https://justfly.idv.tw/s/Z2WHk6v