伺服器錯誤解析 | 台灣超商機台與 API 500 問題剖析

資料寫進去了，但 500 自己爬上來了

去超商機台列印文件，紙從出紙口滑出來了，文件在手上，結果螢幕彈出「列印失敗，請重試」。台灣的超商機台幾乎承擔一切服務——繳費、取票、列印——這種場景不是假設，是很多人真的愣在那裡不知道要不要相信那個紅色畫面的瞬間。後端的 500 就是這件事，只是發生在 API 層。

技術環境

Node.js + Express 後端，MySQL 資料庫。event_logs 資料表用 ENUM 欄位限制事件類型；事件通知為 await 同步呼叫，與主操作在同一個 request handler 內執行。問題模式與框架無關——任何在主操作後同步執行側效應、且側效應例外未被攔截的設計，都會複現相同行為。

現象

前端送出一個編輯操作，後端回了 500，前端顯示失敗。但打開資料庫——紀錄確實更新了，時間戳正確，欄位值也對。主操作完全正常，500 是從別處傳染過來的。

追進去發現，寫入完成之後，系統會自動觸發一段事件通知邏輯。那段 INSERT 用了一個新的事件類型值：group_expense_updated。問題在於這個值不在資料表的 ENUM 列表裡。一丟進去就炸，而且整段通知邏輯沒有被 try/catch 包住，例外一路往上傳，把整個請求的 HTTP 回應拖成了 500。

錯誤傳染鏈（時序）

Client              API Server                MySQL
  |                      |                      |
  |── PUT /expense ──────>|                      |
  |                      |── UPDATE expenses ───>|
  |                      |<── OK ────────────────|  ← 資料已寫入 ✓
  |                      |                      |
  |                      |── INSERT event_log ──>|  ← 'group_expense_updated'
  |                      |                      |   不在 ENUM 列表 → SQL Error
  |                      |<── Error ─────────────|
  |                      | (uncaught exception)  |
  |<── 500 ──────────────|                      |
  |
  DB state:  row updated ✓
  Frontend:  operation failed ✗

主操作在第三步就已完成。500 來自第六步：一個不在 ENUM 列表裡的值觸發 SQL 錯誤，例外沒有被攔截，一路往上傳，接管了整個 HTTP 回應。

容易誤判的原因

第一時間看起來是寫入失敗。500 就是伺服器錯誤，前端收到 500 就判定操作沒完成——這是合理的推論，幾乎任何框架的預設行為都是這樣。沒有人會先去資料庫確認主操作是否成功，再回來看是哪個側效應炸了。

更深的問題是：這個事件通知是個「沒人在意的側效應」。它不是主流程，不回傳任何對前端有意義的資料，理論上失敗了也不應該影響主操作的結果。但它沒有被隔離。側效應的例外直接接管了主請求的回應碼，喧賓奪主。

分界點

問題出現的精確位置：新增了一個事件類型，但沒有同步更新 schema 的 ENUM 列表。ENUM 是資料庫層的強型別約束，插入不在列表裡的值，直接報錯，不商量。

這個設計選擇在系統早期是合理的——ENUM 確保類型值乾淨可控。但系統在成長，事件類型會增加，每次新增都需要一次 schema migration，而 migration 很容易在部署流程中被遺漏。這次就是被遺漏了。

修法兩步，都很直接：把 ENUM 欄位改成 VARCHAR，以後新增事件類型不再需要 schema migration；把所有事件通知邏輯包進獨立的 try/catch，讓側效應的失敗不再決定主操作的回應碼。前者是 migration 010，後者是錯誤邊界的補課。

Code 對照：修法前後

修法前：ENUM 約束 + 無 try/catch

-- event_logs schema（修法前）
ALTER TABLE event_logs
  ADD COLUMN event_type ENUM(
    'user_login',
    'expense_added',
    'group_created'
    -- ← 'group_expense_updated' 不在這裡
  ) NOT NULL;

// Request handler（修法前）
app.put('/groups/:id/expenses/:expId', async (req, res) => {
  await db.query('UPDATE expenses SET amount = ? WHERE id = ?', [...]);
  await notifyExpenseUpdated(req.params.id, req.user.id);
  // ↑ ENUM 裡沒有 'group_expense_updated'，INSERT 炸掉
  // ↑ 例外未攔截，500 直接回給 client
  res.json({ success: true });
});

修法後：VARCHAR + 獨立 try/catch

-- Migration 010：ENUM 改 VARCHAR
ALTER TABLE event_logs
  MODIFY COLUMN event_type VARCHAR(100) NOT NULL;
-- 之後新增事件類型不再需要 schema migration

// 側效應函式（修法後）
async function notifyExpenseUpdated(groupId, userId) {
  try {
    await db.query(
      'INSERT INTO event_logs (event_type, group_id, user_id) VALUES (?, ?, ?)',
      ['group_expense_updated', groupId, userId]
    );
  } catch (err) {
    logger.error('Event notification failed', { groupId, userId, err: err.message });
    // 記 log，不往上丟
  }
}

// Request handler（修法後）
app.put('/groups/:id/expenses/:expId', async (req, res) => {
  await db.query('UPDATE expenses SET amount = ? WHERE id = ?', [...]);
  notifyExpenseUpdated(req.params.id, req.user.id); // fire-and-forget
  res.json({ success: true }); // 主操作結果不被側效應劫持
});

該被隔離的側效應類型

任何在主操作完成後才執行的邏輯，失敗不應影響主操作的回應碼。常見需要隔離的模式：

Event log / 稽核日誌：寫入 event_logs、audit_logs 等資料表
推播通知：發 push notification、Email、SMS
Webhook 觸發：呼叫外部系統的 outbound webhook
快取失效：清除 Redis / Memcached 快取項目
搜尋索引更新：同步 Elasticsearch / Algolia 索引
Analytics 事件：傳送 Mixpanel、GA 等追蹤事件
非同步任務派發：丟 job 到 queue（BullMQ、SQS 等）

判斷標準：如果這段邏輯失敗不應讓使用者看到「操作失敗」，它就需要邊界隔離。

留給下次的一件事

任何在主操作完成後才執行的邏輯——通知、稽核日誌、事件廣播——都應該預設被隔離。它們失敗，主操作不應該跟著死。這不是防禦性編程的風格問題，是邊界設計。側效應拿到了整個請求的控制權，這件事從一開始就不該發生。

下次看到「操作失敗」的時候，值得多問一句：資料庫裡長什麼樣子？

— 邱柏宇

The Write Succeeded. The 500 Didn’t Care.

You print a document at a convenience store kiosk. The paper slides out. You hold it in your hand. The screen flashes red: “Print failed. Please try again.” The write succeeded. The error came from somewhere else entirely.

That’s exactly what happened here — just at the API layer.

Technical Environment

Node.js + Express backend, MySQL database. The event_logs table used an ENUM column to constrain event type values. The event notification ran as an await-ed synchronous call inside the same request handler as the primary write. The pattern isn't framework-specific — any design that runs side effects synchronously after a primary operation, without catching their exceptions, will reproduce the same failure.

What Was Observed

A client submitted an edit operation. The server returned a 500. The frontend reported failure. But opening the database showed the record had been correctly updated — right timestamp, right values. The primary write completed without issue. The 500 came from a side effect.

Digging in: after the write, the system automatically triggered an event notification. That notification attempted an INSERT with a new event type value, group_expense_updated, which wasn’t in the table’s ENUM list. The insert blew up. The notification logic had no try/catch wrapping it. The exception propagated all the way up and hijacked the HTTP response code for the entire request.

Error Propagation Sequence

Client              API Server                MySQL
  |                      |                      |
  |── PUT /expense ──────>|                      |
  |                      |── UPDATE expenses ───>|
  |                      |<── OK ────────────────|  ← row written ✓
  |                      |                      |
  |                      |── INSERT event_log ──>|  ← 'group_expense_updated'
  |                      |                      |   not in ENUM → SQL Error
  |                      |<── Error ─────────────|
  |                      | (uncaught exception)  |
  |<── 500 ──────────────|                      |
  |
  DB state:  row updated ✓
  Frontend:  operation failed ✗

The primary operation completed at step three. The 500 came from step six: a value outside the ENUM triggered a SQL error, the exception went uncaught, propagated upward, and claimed the HTTP response for the entire request.

Why It Was Misread

A 500 means server error. The frontend received a 500 and concluded the operation had failed — that’s the default assumption for any sane client. Nobody’s first instinct is to check the database directly, then trace back to figure out which side effect detonated.

The deeper issue: this event notification was an afterthought — no meaningful return value, no bearing on the primary result. It should have been irrelevant to the outcome. But it wasn’t isolated. An unhandled exception in a side effect claimed ownership of the entire request’s response code.

The Exact Boundary

The fault point: a new event type was added without a corresponding update to the schema’s ENUM list. ENUM is a hard constraint at the database level — insert a value outside the list and it errors immediately, unconditionally.

ENUM was a reasonable choice early on, when the set of event types was small and stable. But event types grow. Every addition requires a schema migration, and migrations are easy to miss in deployment. This one was missed.

The fix was two steps. Replace the ENUM column with VARCHAR — new event types no longer require a migration. Wrap all event notification logic in its own try/catch — side-effect failures no longer decide the primary operation’s response code.

Code Diff: Before and After

Before: ENUM constraint + no try/catch

-- event_logs schema (before)
ALTER TABLE event_logs
  ADD COLUMN event_type ENUM(
    'user_login',
    'expense_added',
    'group_created'
    -- ← 'group_expense_updated' missing
  ) NOT NULL;

// Request handler (before)
app.put('/groups/:id/expenses/:expId', async (req, res) => {
  await db.query('UPDATE expenses SET amount = ? WHERE id = ?', [...]);
  await notifyExpenseUpdated(req.params.id, req.user.id);
  // ↑ 'group_expense_updated' not in ENUM — INSERT throws
  // ↑ Exception uncaught — 500 returned to client
  res.json({ success: true });
});

After: VARCHAR + isolated try/catch

-- Migration 010: ENUM → VARCHAR
ALTER TABLE event_logs
  MODIFY COLUMN event_type VARCHAR(100) NOT NULL;
-- New event types no longer require a schema migration

// Side-effect function (after)
async function notifyExpenseUpdated(groupId, userId) {
  try {
    await db.query(
      'INSERT INTO event_logs (event_type, group_id, user_id) VALUES (?, ?, ?)',
      ['group_expense_updated', groupId, userId]
    );
  } catch (err) {
    logger.error('Event notification failed', { groupId, userId, err: err.message });
    // Log it. Don't rethrow.
  }
}

// Request handler (after)
app.put('/groups/:id/expenses/:expId', async (req, res) => {
  await db.query('UPDATE expenses SET amount = ? WHERE id = ?', [...]);
  notifyExpenseUpdated(req.params.id, req.user.id); // fire-and-forget
  res.json({ success: true }); // primary result no longer hijacked
});

Side Effects That Should Be Isolated

Any logic that runs after the primary operation completes should be treated as a side effect. If its failure shouldn't surface as a user-facing error, it needs a boundary. Common patterns that need isolation:

Event logs / audit trails: writes to event_logs, audit_logs tables
Push notifications: push, email, SMS delivery
Outbound webhooks: calls to external systems
Cache invalidation: Redis / Memcached evictions
Search index updates: Elasticsearch / Algolia sync
Analytics events: Mixpanel, GA tracking calls
Async job dispatch: enqueuing work into BullMQ, SQS, etc.

The test: if this logic failing shouldn't cause the user to see "operation failed," it needs an isolation boundary.

One Thing Worth Remembering

Any logic that runs after the primary operation completes — notifications, audit logs, event broadcasts — should be isolated by default. If it fails, the primary operation shouldn’t go down with it. This isn’t a style preference. It’s a boundary decision. A side effect having full control over the request’s response code was always the wrong design.

Next time “operation failed” shows up: it’s worth checking what the database actually says first.