AIOct 27, 20258 min read

Hallucination is a UX problem you can design around

You can't make a model never wrong, but you can make being wrong survivable for the person using it.

Most products that wrap a model spend their entire engineering budget trying to make it wrong less often. Better retrieval, bigger context, a sharper system prompt, an eval suite that creeps from 89 to 92 percent. All worth doing. None of it solves the actual problem, because the actual problem is not that the model is sometimes wrong. The problem is what happens to the person standing there when it is.

A hallucination is only a catastrophe if the interface hands it over as fact and then steps back. The model said it, the screen rendered it, the user acted on it, and nobody in that chain ever signaled "this part might not be load-bearing." That last failure is not a model failure. It is a design decision, made by omission, and you can make a better one.

Confidence is information the interface throws away

Every model already knows more than it shows. Token probabilities, retrieval scores, whether an answer came from a cited source or from the weights — that signal exists at generation time and then gets flattened into the same calm paragraph whether the system is certain or guessing. We spend enormous effort producing that uncertainty and then design it out of existence at the last step.

I built a tax-question feature once where the first version answered everything in the same authoritative tone. Users trusted the confident-but-wrong answers exactly as much as the confident-and-right ones, which is the worst possible outcome — the interface had erased the only distinction that mattered. The fix was not a better model. It was surfacing what the model already produced.

Generated answer

You can deduct home-office expenses if the space is used regularly and exclusively for work.

Confidence

64%

↗ irs.gov↗ state tax bulletin

Fig. 1 — Same answer, but the confidence and sources change how a person uses it.

Once the answer carries its own confidence and its sources, the user's behavior changes on its own. A 64 means they read the cited bulletin before filing. A 96 means they move on. You did not make the model more accurate. You made the person more accurate, which is the thing you were actually being paid for.

Design for the wrong answer, not the average one

Most UX gets designed around the median case — the demo path, the answer that lands. But trust is set at the tails. One confidently wrong answer that costs someone real money will outweigh fifty good ones, because people remember the time the tool burned them far longer than the times it helped.

So design the failure first. For every AI surface, ask the unglamorous question: when this is wrong, what does it cost, and how fast can the person catch it? The answer should shape the interface more than the happy path does.

→Make the inputs visible. If an answer rests on three retrieved documents, show them, so a wrong premise is obvious before the conclusion is.
→Keep the human in the action, not just the reading. Draft the email; don't send it. Propose the migration; don't run it.
→Make verification cheaper than blind trust. If checking takes one click and trusting takes zero, people will check the things that matter.

You are not designing for the answer that's right. You are designing for the one person who gets the answer that's wrong.

The cost of a wrong answer is rarely the wrong answer itself. It is the irreversible thing the person did because the interface let them believe it. A wrong sentence is nothing. A wrong sentence that auto-executed a database migration is an incident.

Thresholds belong in the product, not just the prompt

Some of this is structural, not visual. A system that knows its own confidence can route on it — answer directly when it's sure, ask a clarifying question when it's split, hand off to a human when the stakes and the doubt are both high. The threshold is a product decision, and it should live where you can tune it.

route.ts


const result = await answer(question);
if (result.confidence < 0.6) return askToClarify(result);
if (result.confidence < 0.85 && isHighStakes(question)) return routeToHuman(result);
return present(result);

The model produces a confidence; the product decides what to do with it.

The numbers in that branch are not model parameters. They are the dial that decides how often you'd rather frustrate someone with a question than mislead them with an answer. Where you set it depends entirely on what being wrong costs in your domain — a recipe suggestion and a medication dosage do not get the same threshold, and no amount of model improvement changes that. Someone has to choose, and that someone is a designer.

Honesty compounds

There is a quieter payoff to all of this. A product that admits uncertainty when it is uncertain earns the right to be believed when it is confident. Calibration is the whole asset. Users learn the system's tells the same way they learn a colleague's — and a tool that has never oversold itself gets trusted on the hard calls, which is exactly when trust is worth something.

The teams that win the next few years will not be the ones with the model that hallucinates least. That gap is closing for everyone. They will be the ones whose products made being wrong cheap, visible, and recoverable — so that a hallucination is a moment the user shrugs off, not the moment they stop trusting you. You cannot promise people a model that is never wrong. You can promise them a product that never lets being wrong hurt them quietly. Build that one.

#AI#UX#TrustShare ↗

→ / AUTHOR

Ionut Dumitru

Full-stack engineer and product designer. Writes about building products where the engineering and the design are the same job.

GitHub ↗X ↗

→ / NEXT

EngineeringOct 20, 2025

Logs are for your future self at 3am →

← All writingionutdumitru.com