資料寫進去了,但 500 自己爬上來了

資料寫進去了,但 500 自己爬上來了

去超商機台列印文件,紙從出紙口滑出來了,文件在手上,結果螢幕彈出「列印失敗,請重試」。台灣的超商機台幾乎承擔一切服務——繳費、取票、列印——這種場景不是假設,是很多人真的愣在那裡不知道要不要相信那個紅色畫面的瞬間。後端的 500 就是這件事,只是發生在 API 層。

技術環境

Node.js + Express 後端,MySQL 資料庫。event_logs 資料表用 ENUM 欄位限制事件類型;事件通知為 await 同步呼叫,與主操作在同一個 request handler 內執行。問題模式與框架無關——任何在主操作後同步執行側效應、且側效應例外未被攔截的設計,都會複現相同行為。

現象

前端送出一個編輯操作,後端回了 500,前端顯示失敗。但打開資料庫——紀錄確實更新了,時間戳正確,欄位值也對。主操作完全正常,500 是從別處傳染過來的。

追進去發現,寫入完成之後,系統會自動觸發一段事件通知邏輯。那段 INSERT 用了一個新的事件類型值:group_expense_updated。問題在於這個值不在資料表的 ENUM 列表裡。一丟進去就炸,而且整段通知邏輯沒有被 try/catch 包住,例外一路往上傳,把整個請求的 HTTP 回應拖成了 500。

錯誤傳染鏈(時序)

Client              API Server                MySQL
  |                      |                      |
  |── PUT /expense ──────>|                      |
  |                      |── UPDATE expenses ───>|
  |                      |<── OK ────────────────|  ← 資料已寫入 ✓
  |                      |                      |
  |                      |── INSERT event_log ──>|  ← 'group_expense_updated'
  |                      |                      |   不在 ENUM 列表 → SQL Error
  |                      |<── Error ─────────────|
  |                      | (uncaught exception)  |
  |<── 500 ──────────────|                      |
  |
  DB state:  row updated ✓
  Frontend:  operation failed ✗

主操作在第三步就已完成。500 來自第六步:一個不在 ENUM 列表裡的值觸發 SQL 錯誤,例外沒有被攔截,一路往上傳,接管了整個 HTTP 回應。

容易誤判的原因

第一時間看起來是寫入失敗。500 就是伺服器錯誤,前端收到 500 就判定操作沒完成——這是合理的推論,幾乎任何框架的預設行為都是這樣。沒有人會先去資料庫確認主操作是否成功,再回來看是哪個側效應炸了。

更深的問題是:這個事件通知是個「沒人在意的側效應」。它不是主流程,不回傳任何對前端有意義的資料,理論上失敗了也不應該影響主操作的結果。但它沒有被隔離。側效應的例外直接接管了主請求的回應碼,喧賓奪主。

分界點

問題出現的精確位置:新增了一個事件類型,但沒有同步更新 schema 的 ENUM 列表。ENUM 是資料庫層的強型別約束,插入不在列表裡的值,直接報錯,不商量。

這個設計選擇在系統早期是合理的——ENUM 確保類型值乾淨可控。但系統在成長,事件類型會增加,每次新增都需要一次 schema migration,而 migration 很容易在部署流程中被遺漏。這次就是被遺漏了。

修法兩步,都很直接:把 ENUM 欄位改成 VARCHAR,以後新增事件類型不再需要 schema migration;把所有事件通知邏輯包進獨立的 try/catch,讓側效應的失敗不再決定主操作的回應碼。前者是 migration 010,後者是錯誤邊界的補課。

Code 對照:修法前後

修法前:ENUM 約束 + 無 try/catch

-- event_logs schema(修法前)
ALTER TABLE event_logs
  ADD COLUMN event_type ENUM(
    'user_login',
    'expense_added',
    'group_created'
    -- ← 'group_expense_updated' 不在這裡
  ) NOT NULL;
// Request handler(修法前)
app.put('/groups/:id/expenses/:expId', async (req, res) => {
  await db.query('UPDATE expenses SET amount = ? WHERE id = ?', [...]);
  await notifyExpenseUpdated(req.params.id, req.user.id);
  // ↑ ENUM 裡沒有 'group_expense_updated',INSERT 炸掉
  // ↑ 例外未攔截,500 直接回給 client
  res.json({ success: true });
});

修法後:VARCHAR + 獨立 try/catch

-- Migration 010:ENUM 改 VARCHAR
ALTER TABLE event_logs
  MODIFY COLUMN event_type VARCHAR(100) NOT NULL;
-- 之後新增事件類型不再需要 schema migration
// 側效應函式(修法後)
async function notifyExpenseUpdated(groupId, userId) {
  try {
    await db.query(
      'INSERT INTO event_logs (event_type, group_id, user_id) VALUES (?, ?, ?)',
      ['group_expense_updated', groupId, userId]
    );
  } catch (err) {
    logger.error('Event notification failed', { groupId, userId, err: err.message });
    // 記 log,不往上丟
  }
}

// Request handler(修法後)
app.put('/groups/:id/expenses/:expId', async (req, res) => {
  await db.query('UPDATE expenses SET amount = ? WHERE id = ?', [...]);
  notifyExpenseUpdated(req.params.id, req.user.id); // fire-and-forget
  res.json({ success: true }); // 主操作結果不被側效應劫持
});

該被隔離的側效應類型

任何在主操作完成後才執行的邏輯,失敗不應影響主操作的回應碼。常見需要隔離的模式:

  • Event log / 稽核日誌:寫入 event_logs、audit_logs 等資料表
  • 推播通知:發 push notification、Email、SMS
  • Webhook 觸發:呼叫外部系統的 outbound webhook
  • 快取失效:清除 Redis / Memcached 快取項目
  • 搜尋索引更新:同步 Elasticsearch / Algolia 索引
  • Analytics 事件:傳送 Mixpanel、GA 等追蹤事件
  • 非同步任務派發:丟 job 到 queue(BullMQ、SQS 等)

判斷標準:如果這段邏輯失敗不應讓使用者看到「操作失敗」,它就需要邊界隔離。

留給下次的一件事

任何在主操作完成後才執行的邏輯——通知、稽核日誌、事件廣播——都應該預設被隔離。它們失敗,主操作不應該跟著死。這不是防禦性編程的風格問題,是邊界設計。側效應拿到了整個請求的控制權,這件事從一開始就不該發生。

下次看到「操作失敗」的時候,值得多問一句:資料庫裡長什麼樣子?

— 邱柏宇

延伸閱讀


The Write Succeeded. The 500 Didn’t Care.

You print a document at a convenience store kiosk. The paper slides out. You hold it in your hand. The screen flashes red: “Print failed. Please try again.” The write succeeded. The error came from somewhere else entirely.

That’s exactly what happened here — just at the API layer.

Technical Environment

Node.js + Express backend, MySQL database. The event_logs table used an ENUM column to constrain event type values. The event notification ran as an await-ed synchronous call inside the same request handler as the primary write. The pattern isn't framework-specific — any design that runs side effects synchronously after a primary operation, without catching their exceptions, will reproduce the same failure.

What Was Observed

A client submitted an edit operation. The server returned a 500. The frontend reported failure. But opening the database showed the record had been correctly updated — right timestamp, right values. The primary write completed without issue. The 500 came from a side effect.

Digging in: after the write, the system automatically triggered an event notification. That notification attempted an INSERT with a new event type value, group_expense_updated, which wasn’t in the table’s ENUM list. The insert blew up. The notification logic had no try/catch wrapping it. The exception propagated all the way up and hijacked the HTTP response code for the entire request.

Error Propagation Sequence

Client              API Server                MySQL
  |                      |                      |
  |── PUT /expense ──────>|                      |
  |                      |── UPDATE expenses ───>|
  |                      |<── OK ────────────────|  ← row written ✓
  |                      |                      |
  |                      |── INSERT event_log ──>|  ← 'group_expense_updated'
  |                      |                      |   not in ENUM → SQL Error
  |                      |<── Error ─────────────|
  |                      | (uncaught exception)  |
  |<── 500 ──────────────|                      |
  |
  DB state:  row updated ✓
  Frontend:  operation failed ✗

The primary operation completed at step three. The 500 came from step six: a value outside the ENUM triggered a SQL error, the exception went uncaught, propagated upward, and claimed the HTTP response for the entire request.

Why It Was Misread

A 500 means server error. The frontend received a 500 and concluded the operation had failed — that’s the default assumption for any sane client. Nobody’s first instinct is to check the database directly, then trace back to figure out which side effect detonated.

The deeper issue: this event notification was an afterthought — no meaningful return value, no bearing on the primary result. It should have been irrelevant to the outcome. But it wasn’t isolated. An unhandled exception in a side effect claimed ownership of the entire request’s response code.

The Exact Boundary

The fault point: a new event type was added without a corresponding update to the schema’s ENUM list. ENUM is a hard constraint at the database level — insert a value outside the list and it errors immediately, unconditionally.

ENUM was a reasonable choice early on, when the set of event types was small and stable. But event types grow. Every addition requires a schema migration, and migrations are easy to miss in deployment. This one was missed.

The fix was two steps. Replace the ENUM column with VARCHAR — new event types no longer require a migration. Wrap all event notification logic in its own try/catch — side-effect failures no longer decide the primary operation’s response code.

Code Diff: Before and After

Before: ENUM constraint + no try/catch

-- event_logs schema (before)
ALTER TABLE event_logs
  ADD COLUMN event_type ENUM(
    'user_login',
    'expense_added',
    'group_created'
    -- ← 'group_expense_updated' missing
  ) NOT NULL;
// Request handler (before)
app.put('/groups/:id/expenses/:expId', async (req, res) => {
  await db.query('UPDATE expenses SET amount = ? WHERE id = ?', [...]);
  await notifyExpenseUpdated(req.params.id, req.user.id);
  // ↑ 'group_expense_updated' not in ENUM — INSERT throws
  // ↑ Exception uncaught — 500 returned to client
  res.json({ success: true });
});

After: VARCHAR + isolated try/catch

-- Migration 010: ENUM → VARCHAR
ALTER TABLE event_logs
  MODIFY COLUMN event_type VARCHAR(100) NOT NULL;
-- New event types no longer require a schema migration
// Side-effect function (after)
async function notifyExpenseUpdated(groupId, userId) {
  try {
    await db.query(
      'INSERT INTO event_logs (event_type, group_id, user_id) VALUES (?, ?, ?)',
      ['group_expense_updated', groupId, userId]
    );
  } catch (err) {
    logger.error('Event notification failed', { groupId, userId, err: err.message });
    // Log it. Don't rethrow.
  }
}

// Request handler (after)
app.put('/groups/:id/expenses/:expId', async (req, res) => {
  await db.query('UPDATE expenses SET amount = ? WHERE id = ?', [...]);
  notifyExpenseUpdated(req.params.id, req.user.id); // fire-and-forget
  res.json({ success: true }); // primary result no longer hijacked
});

Side Effects That Should Be Isolated

Any logic that runs after the primary operation completes should be treated as a side effect. If its failure shouldn't surface as a user-facing error, it needs a boundary. Common patterns that need isolation:

  • Event logs / audit trails: writes to event_logs, audit_logs tables
  • Push notifications: push, email, SMS delivery
  • Outbound webhooks: calls to external systems
  • Cache invalidation: Redis / Memcached evictions
  • Search index updates: Elasticsearch / Algolia sync
  • Analytics events: Mixpanel, GA tracking calls
  • Async job dispatch: enqueuing work into BullMQ, SQS, etc.

The test: if this logic failing shouldn't cause the user to see "operation failed," it needs an isolation boundary.

One Thing Worth Remembering

Any logic that runs after the primary operation completes — notifications, audit logs, event broadcasts — should be isolated by default. If it fails, the primary operation shouldn’t go down with it. This isn’t a style preference. It’s a boundary decision. A side effect having full control over the request’s response code was always the wrong design.

Next time “operation failed” shows up: it’s worth checking what the database actually says first.

— 邱柏宇

Related Posts