The error message is an API
An error message is the part of your system that only ever shows up when someone is already having a bad day.
Most error messages are written for the person who shipped the bug, not the person who hit it. They leak the internals of the moment they were thrown — a stack frame, a null reference, a status code with no subject — and they leave the reader to reverse-engineer a system they can't see. That's a failure of design, but it's also a failure of architecture, because an error message is not decoration. It is a return value. It is part of your interface whether you treat it that way or not.
I've started telling teams to design errors the way they design endpoints. The happy path gets schemas, versioning, reviews, deprecation policies. The error path gets a throw new Error("something went wrong") and a shrug. Then we wonder why integration takes three weeks and the support queue is full of people quoting our own messages back to us. The error is the most-read documentation we ship. We just never wrote it on purpose.
An error has a reader, and the reader is not you
Every error has an audience, and the audience changes who you're writing for. A human staring at a form needs a next action: what they did, what the system expected, what to try. A developer integrating your API needs a stable identifier they can branch on and a human-readable string they will never parse. An on-call engineer at 3am needs the one fact that locates the fault — which service, which dependency, which input — not a sanitized "internal error" that protects no one and helps no one.
The mistake is collapsing these readers into one string. The message that satisfies all of them satisfies none. So the first design decision isn't wording. It's separating the parts of an error by who reads them.
- →A stable code — machine-readable, never localized, safe to branch on. It is an API surface, so you can't change it casually.
- →A human message — specific, actionable, written in the reader's language and the reader's frame.
- →Context — the fields that locate the fault, logged for you, never leaked to the caller.
A code is a promise
The moment a client writes if (err.code === "rate_limited"), that string is load-bearing. Rename it to too_many_requests in a "cleanup" PR and you've shipped a breaking change with no version bump and no changelog. Codes are an API the same way a route is an API, and they deserve the same discipline: a closed set, documented, deprecated on a schedule, never reused for a different meaning.
This is also where the human string earns its freedom. Because the code carries the contract, the message can be rewritten, translated, and A/B tested without breaking a single integration. Separating them isn't extra work — it's what lets both halves be good at their job.
The code is the contract; the message is free to change.
Every error code you ship is a promise you have to keep, so ship as few as you can defend.
Errors are where the trust is decided
Anyone can make the success path feel good. The product reveals what it actually thinks of you when something breaks. A 500 that says "internal error" tells the user nothing went wrong on their end and nothing will be done about it. A 422 that says exactly which field is wrong and why tells them the system was paying attention.
Two rules carry most of the weight here. First, never blame the user for a failure that was yours; "we couldn't reach the payment provider, your card was not charged" keeps a customer who would otherwise assume the worst. Second, every error should imply a next step, even if that step is "wait and retry" or "contact support with this id." An error that closes every door is just an apology, and users can tell the difference between an apology and a system that has thought about them.
The dangerous cases are the ones that try to be helpful and lie. "User not found" on a login form leaks which emails have accounts. A retry suggestion on an error that is not transient sends people in circles. Helpfulness without honesty is worse than silence, because it spends trust you'll want later.
Write the failure first
The practical habit is to design the error before the feature. When you spec an endpoint, enumerate the ways it fails, give each one a code and a message, and review those with the same eyes you bring to the response body. It costs an hour and it changes the design: you discover states you hadn't handled, inputs you can't validate, dependencies you can't trust — all while it's still cheap to fix.
Do this and the support queue gets quieter, integrations land faster, and the 3am page gets shorter because the message already says where to look. The error message was always part of the contract. The only question is whether you wrote that part of the contract, or let it write itself.
