root 建的 .env，ubuntu 永遠讀不到JustFLY~JustBlog~

docker compose restart，報了 permission denied，指向一份明明準備好的 .env。路徑對，內容對，檔案存在——ls -la 一眼就看到問題：root:root 600。

有點像用主管帳號把教育訓練講義上傳到雲端硬碟、設成只有自己看得到，然後叫新員工去下載。新員工有公司帳號，但那份文件的存取名單裡根本沒有他。帳號沒問題，是文件的設置從一開始就沒考慮到要給他讀。

台灣人大概都碰過這種感覺：健保卡可以登入一個政府系統，但同一套資料到了另一個視窗卻要改用自然人憑證——在 A 系統你是你，在 B 系統你什麼都不是。文件在，資格在，但那份資源從來就不是給這個身份的。

技術環境

技術環境：shell script 在 root 權限下操作受保護目錄，順手建立了 .env；執行 docker compose 的 runtime 用戶是 ubuntu，非 root，也不在任何具備讀取權的群組。Linux 多用戶環境，兩個身份，從不交叉。

建立者和執行者從來不是同一個人

這個 .env 是在維護腳本用超級權限操作時順手建立的，加了 600 完全符合安全規範——裡面有 credentials，不該讓任何人隨意讀。問題不是 600 錯了，是 owner 寫死在 root，而 docker compose 的 runtime 用戶是 ubuntu，兩者之間沒有任何群組交集。

文件在正確位置。但「能寫入」和「能讀取」從設置的那一刻起就是兩個不同的身份，沒有人在當下想到要對齊這兩件事。

錯誤傳染鏈（時序）

MaintenanceScript (root)
  │
  ├─► 建立 .env（owner: root, mode: 600）
  │        OS → FileSystem 寫入成功
  │
docker compose restart（runtime user: ubuntu）
  │
  └─► 嘗試 open /path/onwatch-repo/.env
           OS → FileSystem 權限檢查
                  └─► ubuntu ≠ root，非群組成員
                           └─► FAILED: permission denied

ubuntu 看不到這個文件，不是因為它不存在，是因為文件的 ACL 從建立那刻起就把它排在外面。

600 沒有錯，owner 才是分界點

這裡容易誤判的地方是：600 看起來就是問題所在，改成 644 就好。但 644 是讓任何人都能讀，對一份有 credentials 的 .env 來說過於寬鬆。真正的分界點是 owner——只要把 owner 對齊執行服務的 runtime 用戶，600 就可以繼續留著，安全規範不用妥協。

這次 chmod o+r 修了眼前的問題，但不是最乾淨的解法。
更精確的做法是對齊擁有者：

Code 對照：修法前後


# 修法前（問題狀態）
# -rw------- root root .env   ← ubuntu 完全看不到

# 修法後（對齊 runtime 用戶）
sudo chown ubuntu:ubuntu /path/to/.env
# 保留 600，只有 ubuntu 能讀，不向其他人開放
# 或如果需要 root 維護、ubuntu 讀：
sudo chown root:ubuntu /path/to/.env
sudo chmod 640 /path/to/.env
# group 成員可讀，other 無權

chmod o+r 是讓所有人都可讀（other+r），對有 credentials 的 .env 是不必要的開放。chown 才是正確工具。

靜默失敗比 crash 更難找

.env 不可讀時，不是所有服務都會爆炸報錯。有幾類會靜默失敗，值得單獨列出來：

該被隔離的側效應類型

env_file 指令型服務（docker compose）：compose 在 container 建立時讀取 env_file，若此時 permission denied，container 起不來，錯誤訊息直接指向檔案路徑——這次的情形，算是最明顯的一類。
已跑中的 container（docker restart）：env vars 在 container create 時就 baked 進 process，restart 不會重讀 env_file，所以即使後來改了 .env 的內容，跑中的 container 看到的還是舊值。permission 問題在這裡不會報錯，只會默默用錯設定。
健康檢查 / probe 腳本：以非特權用戶跑 shell script 做 healthcheck 的服務，若腳本需要讀 .env 取得連線資訊，會靜默拿不到值，health endpoint 回 500，監控面板紅燈，但原因不在服務本身。
cron job / scheduled task：系統排程以特定用戶身份執行，若那個用戶讀不到 .env，job 靜默跳過或帶空值執行，沒有明顯 crash，只有行為異常。
CI/CD pipeline（deploy step）：部署腳本如果嘗試讀取 .env 驗證設定，但執行用戶是受限的 CI runner，會失敗但有時只丟 warning 而非 error，被忽略過去。
logging agent / sidecar：sidecar 以獨立用戶跑，若需要從 .env 取 API key 初始化，會在啟動時靜默跳過認證，日誌送不出去但服務本身不報錯。
volume mount 型設定載入：某些服務用 volume 掛載整個設定目錄，若目錄下某個 .env 的 permission 不對，服務可能選擇 fallback 到預設值，完全不提示實際載入的是哪個設定。

判斷原則：任何以非 root 用戶執行、需要在啟動或運行時讀取外部設定文件的進程，都應該在設定文件建立當下就確認 owner 和 mode，而不是等到行為異常才回頭查。

下次建立 .env 的那一刻

確認方式只有一步：ls -la 看 owner 和 mode，對照 runtime 用戶是誰。owner 不對，chown 修；mode 過窄，看是要 640 還是留 600 再調整群組。

這個問題有一個具體的發生時機：在 root 或 sudo 環境下操作受保護目錄時「順手」建立了設定文件。順手的代價是沒有思考過「這個文件以後要被誰讀」。任何需要被服務進程讀取的設定文件，在建立的時候就要明確設定它的擁有者和權限——不是寫完內容之後，就是在建立的同時。

留著的開放問題：如果同一份 .env 需要被 root 維護、被 ubuntu 讀取，最乾淨的邊界到底是 640 + 群組，還是 ACL？在多個維護者的環境裡，這個問題沒有自動的答案。

— 邱柏宇

The .env Root Wrote, Ubuntu Can Never Read

docker compose restart threw a permission denied, pointing at a .env that was clearly there. Path correct, content correct, file exists — ls -la told the whole story in one line: root:root 600.

It’s like an admin uploading an onboarding document to cloud storage, setting visibility to themselves only, then asking a new hire to download it. The new hire has a valid company account. The document’s access list just never included them. Nothing is broken. The file was never theirs to read.

Technical Environment

Technical environment: a maintenance shell script ran under root privileges to operate on a protected directory, creating the .env in passing. The runtime user executing docker compose is ubuntu — not root, not in any group with read access. Two identities on the same Linux host, no intersection.

Writer and Reader Were Never the Same Identity

The .env was created during a privileged maintenance operation. Mode 600 was correct — the file holds credentials, nobody should read it casually. The issue was never the mode. The owner was locked to root, and the docker compose runtime user is ubuntu. No group bridges them.

The file sat in the right place. “Able to write” and “able to read” were two different identities from the moment of creation, and nobody thought to align them at that moment.

Error Propagation Sequence

MaintenanceScript (root)
  │
  ├─► creates .env (owner: root, mode: 600)
  │        OS → FileSystem: write succeeds
  │
docker compose restart (runtime user: ubuntu)
  │
  └─► attempts open /path/onwatch-repo/.env
           OS → FileSystem: permission check
                  └─► ubuntu ≠ root, not in group
                           └─► FAILED: permission denied

ubuntu cannot see this file — not because it’s missing, but because the ACL at creation time placed ubuntu outside the readable set.

600 Is Fine. Owner Is the Pivot.

The easy misread is to blame 600 and change it to 644. But 644 lets anyone read — too permissive for a file holding credentials. The real pivot is ownership. Align the owner to the runtime user and 600 can stay. Security posture intact.

chmod o+r fixed this incident but opened the file to all others on the system. The right tool is chown:

Code Diff: Before and After


# Before (broken state)
# -rw------- root root .env   ← ubuntu has no access

# After (aligned to runtime user)
sudo chown ubuntu:ubuntu /path/to/.env
# keep mode 600 — only ubuntu reads it, nobody else

# Or if root maintains, ubuntu reads:
sudo chown root:ubuntu /path/to/.env
sudo chmod 640 /path/to/.env
# group member can read, others cannot

chmod o+r (other+r) broadcasts readability to every user on the host. For a credentials file, that’s unnecessary exposure. chown is the precise instrument.

Silent Failures Are Harder to Find Than Crashes

Not every process explodes visibly when a .env is unreadable. Several categories fail quietly:

Side Effects That Should Be Isolated

env_file directive services (docker compose): compose reads env_file at container create time. If permission denied at that moment, the container won’t start and the error message points directly at the file path — the most obvious failure mode, and what happened here.
Running containers after docker restart: env vars are baked into the process at container create. A restart doesn’t re-read env_file, so even if the .env content changes later, the running container keeps stale values. Permission issues here produce no error — just silent wrong config.
Health check / probe scripts: scripts running as an unprivileged user to check connectivity, if they source .env for connection strings, silently get empty values. Health endpoint returns 500. Monitoring goes red. The service itself is fine.
Cron jobs / scheduled tasks: system jobs running as a specific user; if that user can’t read .env, the job executes with empty variables or skips silently. No crash, just behavioral anomalies.
CI/CD pipeline deploy steps: deploy scripts sourcing .env for config validation, running as a restricted CI runner, may emit warnings rather than errors. Gets ignored. Deployment proceeds with wrong state.
Logging agents / sidecars: a sidecar running as its own user, initializing with an API key from .env, silently skips authentication on startup. Logs stop shipping. The main service never knows.
Volume-mount config loaders: services mounting an entire config directory may fall back to defaults silently when one file is unreadable, without indicating which config was actually loaded.

The pattern across all of these: any process running as a non-root user that needs to read an external config file at startup or runtime — the owner and mode of that file should be confirmed at creation, not after something behaves strangely.

The Moment the Next .env Gets Created

Verification is one step: ls -la, check owner and mode, compare against the runtime user. Owner wrong — chown. Mode too narrow — decide between 640 and 600 with adjusted group membership.

This problem has a specific moment of origin: creating a config file “in passing” while operating under root or sudo in a protected directory. The cost of doing it in passing is never asking “who will read this file later.” Any config file that a service process needs to read — set its owner and mode at creation, not after writing the content, at the same time.

The question that doesn’t have an automatic answer: when the same .env needs to be maintained by root and read by ubuntu, which boundary is cleaner — 640 with a shared group, or POSIX ACLs? In a multi-maintainer environment, that’s still an open decision.

— 邱柏宇

延伸閱讀

短網址分享

https://justfly.idv.tw/s/W1q2gWg