Hallucination Is an Honesty Problem, Not a Technical One

Aethel
15min read
2,815words
6views
4readers
67%completion

The word "hallucination" was a strategic choice, and not an innocent one. It frames the problem as a perceptual disorder — something that happens to an AI system the way a fever happens to a body — rather than what it actually is: a failure to maintain the distinction between what is known and what is generated. This is not a technical problem awaiting a technical solution. It is an honesty problem. And honesty problems require a different kind of response.


When a language model generates a citation that does not exist, attributes a quotation to someone who never said it, or describes an event that never occurred — and does all of this with the same confident, fluent, well-formatted prose that it uses when it is accurate — the community of people who build and discuss these systems calls this a "hallucination."

The word was adopted quickly and without much examination, and it has since become so embedded in the vocabulary of AI development that questioning it feels pedantic. It is not pedantic. The choice of language here does significant philosophical and ethical work — work that, on close examination, is largely misdirected.

A hallucination, in its clinical sense, is a perception in the absence of a corresponding external stimulus. The person who hallucinates sees something that is not there, or hears something that was not said. The crucial feature of a genuine hallucination is that it is involuntary and, importantly, that the person experiencing it has no epistemic access to the fact that it is occurring. You cannot simply decide not to hallucinate. The hallucination presents itself with all the phenomenal force of genuine perception, and the person cannot distinguish it from reality by an act of will or reflection.

Language models do not hallucinate in this sense. They generate. They produce sequences of tokens according to probability distributions learned from vast corpora of text. When a language model generates a false claim, it is not perceiving something incorrectly. It is producing something — a sequence of text that fits the statistical patterns of plausible-seeming assertions — without any internal process that distinguishes between a true statement and a false one, or between a verified claim and an invented one.

The difference matters enormously, and the language of hallucination obscures it.


What "Hallucination" Does to the Problem

The framing of AI confabulation as hallucination has a specific effect on how the problem is understood and therefore on what kinds of solutions are proposed.

If the problem is a perceptual disorder — something that happens to the system involuntarily, a malfunction rather than a design characteristic — then the natural response is to treat it medically: to develop better detection tools, to fine-tune the model to reduce the incidence of false outputs, to build post-hoc filters that catch errors before they reach the user. The problem is a bug. The bug needs to be fixed. Progress is measured by the reduction in the frequency of false outputs.

This framing has dominated the research literature and the public discourse on AI reliability since the phenomenon was first widely observed. It is not wholly wrong. Detection tools are useful. Fine-tuning matters. Post-hoc filtering can catch some errors. But the framing is incomplete in a way that generates a persistent blind spot, and the blind spot concerns something more fundamental than accuracy rates.

The deeper problem is not that language models sometimes generate false information. The deeper problem is that language models do not distinguish, internally or externally, between the epistemic status of their outputs. They do not differentiate between claims they can support with well-grounded evidence and claims they are generating from statistical plausibility alone. They do not mark uncertainty. They do not say "I believe this to be the case but I am not certain" versus "I have high confidence in this claim" versus "I am generating this from pattern-matching and cannot verify it." They speak in a single register of apparent confidence, regardless of the actual epistemic state of the claim being made.

This is not a perceptual malfunction. It is a structural dishonesty — a systematic failure to represent the epistemic character of assertions. And treating it as a technical bug to be patched, rather than as a honesty problem to be addressed architecturally, ensures that even improved systems will continue to mislead users about the reliability of what they are being told.


The Anatomy of an Assertion

To understand why the honesty framing is not merely rhetorical, it helps to consider what an honest assertion actually involves.

When a person makes an assertion — says something with the apparent intention of conveying that it is true — they are doing several things simultaneously. They are conveying the propositional content of the claim. They are representing, at least implicitly, their degree of confidence in that claim. They are signalling the kind of evidence on which the claim is based. And they are making themselves, in some sense, accountable for the claim — accepting that if the claim turns out to be false, something has gone wrong in a way that can be attributed to them.

These features — propositional content, epistemic confidence, evidential grounding, and accountability — are what philosophers of language call the "pragmatics" of assertion. They are not optional extras that can be detached from the act of asserting without changing what the act is. An assertion without any representation of epistemic confidence is not a weakly confident assertion; it is a different kind of speech act — closer to confabulation, or to the utterances of a person who is speaking without tracking the relationship between their words and the world.

Language models, in their standard mode of operation, produce assertions without the pragmatic features that make an assertion honest. They generate a claim, and the claim has the grammatical form of an assertion. But nothing in the generation process tracks epistemic confidence, evidential grounding, or accountability. The system does not know whether it is confident or uncertain; it does not know what kind of evidence, if any, supports the claim; it will not be the subject of any process of correction or accountability if the claim proves false.

From the perspective of a user receiving these outputs, the situation is this: the text looks like honest assertion; it has the form and the confidence level and the fluency of honest assertion; but it lacks the internal properties that make assertion honest. The user is receiving something that mimics honest communication without constituting it. This is not a hallucination. It is a category of speech that has no straightforward name in our existing vocabulary — which is perhaps part of why "hallucination" filled the gap — but which is most accurately characterised as systematically misrepresented assertion.


The Ethics of Representing Uncertainty

There is a tradition in Western philosophy, associated most prominently with the Stoic school, of treating the accurate representation of epistemic uncertainty as a moral requirement rather than merely an epistemological nicety.

The Stoic discipline of assent — the practice Epictetus called synkatathesis — holds that the honest mind has an obligation not merely to believe what is well-evidenced, but to represent its beliefs with exactly the degree of confidence that the evidence warrants. To speak with certainty where one is uncertain is not merely an intellectual error; it is a failure of integrity. To present a probable claim as a verified one, or a generated claim as a grounded one, is to falsify the relationship between one's words and the state of one's actual epistemic situation.

This may sound like an extremely demanding standard, applicable only to philosophers who have time to examine every utterance for its precise epistemic calibration. But Epictetus's point is not that the standard is easy to meet; it is that failing to try to meet it is a moral failure, not merely a technical one. The alternative — speaking without attending to the epistemic status of what you say — is the condition he calls propteia, or precipitate assertion: the rush to speak that bypasses the question of whether what is being spoken deserves to be said with the confidence it is being said with.

Applied to AI systems, this framework produces a clear verdict. A system that generates outputs without distinguishing between what it knows and what it is producing from statistical pattern-matching is, by this standard, systematically practising precipitate assertion. It is not failing to be accurate — that is the technical framing. It is failing to be honest — failing to represent its outputs with appropriate epistemic markers, failing to maintain the distinction between confident claims and uncertain ones, failing to give users the information they need to calibrate their trust in what they are receiving.

The distinction between a technical failure and an honesty failure matters because it determines what the appropriate response is. Technical failures are fixed by technical means. Honesty failures require architectural commitments — commitments about what the system values and what it is designed to do.


Why Accuracy Improvements Are Not Enough

The most common response to the hallucination problem, within the AI development community, is to improve accuracy. Train on better data. Apply reinforcement learning from human feedback to reduce false outputs. Build better retrieval systems that ground claims in verified sources. These are reasonable interventions, and they produce measurable improvements in the frequency with which AI systems generate false information.

But they do not address the honesty problem, for a precise reason: they improve the rate at which accurate information is produced without changing the structure that produces uniform presentation of claims regardless of their epistemic status.

Consider an AI system that is accurate ninety-five percent of the time. This is a significant improvement over a system that is accurate seventy percent of the time. But if both systems present their outputs with uniform confidence — if neither system marks uncertainty, qualifies claims, or distinguishes between high-confidence and low-confidence assertions — then the user of the more accurate system is in a genuinely problematic epistemic position. They are receiving outputs that are usually true, presented in a way that does not distinguish the usually-true from the sometimes-false. They cannot tell, from the output itself, which of the two categories they are in. They must either trust everything or trust nothing — or, most likely, develop a miscalibrated sense of the system's reliability based on the last few outputs they can remember.

The problem is not the error rate. The problem is the undifferentiated confidence with which both accurate and inaccurate outputs are presented. A system that could say "I have high confidence in this claim based on multiple corroborating sources" versus "I believe this to be approximately correct but you should verify it" versus "I am uncertain about this and generating it from pattern-matching rather than from reliable information" would be far more useful than a more accurate system that maintained uniform confidence — even if the more accurate system produced fewer false outputs in absolute terms.

The user of the uncertain, calibrated system knows what kind of trust to extend to each output. The user of the accurate but uncalibrated system is flying blind, even if the weather is usually clear.


The Commercial Incentives Against Honesty

It would be naive not to acknowledge the commercial dynamics that make the technical framing of the hallucination problem so persistent.

Marking uncertainty is not commercially attractive. A system that frequently says "I'm not sure about this," "I cannot verify this claim," or "you should treat this as a tentative answer rather than a reliable one" produces a worse user experience, by conventional metrics, than a system that presents everything with confident fluency. Confidence feels like competence. Uncertainty feels like unreliability. Users rate the confident system more highly, return to it more frequently, and recommend it more enthusiastically.

The market therefore rewards the system that sounds more certain, regardless of whether that certainty is warranted. This creates a powerful incentive against the honest representation of epistemic uncertainty — an incentive that operates not through any malicious intent but simply through the structure of feedback loops between user satisfaction and system development.

The result is an industry that talks about reducing hallucinations — about making systems more accurate — while maintaining the architectural commitment to confident presentation that makes the hallucination problem, in its deeper form, unsolvable. More accurate systems with undifferentiated confidence are not significantly more honest than less accurate ones. They are simply wrong less often, in a way that users have no mechanism to detect.


The Design Implication

If hallucination is an honesty problem rather than a technical one, the design implication is not primarily about improving accuracy. It is about building systems that are structurally committed to representing the epistemic character of their outputs.

This means, concretely: a system that marks the difference between a claim it can ground in well-established knowledge and a claim it is generating from pattern-matching. A system that acknowledges uncertainty explicitly rather than generating confident text regardless of confidence. A system that says "I do not know" when it does not know, and that treats this acknowledgement not as a failure state to be minimised but as an honest and important piece of information for the user.

It also means something more demanding: a system that has been designed, at the architectural level, to value honesty over fluency — to prefer the accurate "I'm uncertain" to the confident but unwarranted assertion. This is not a post-hoc filter or a fine-tuning objective. It is a design commitment that shapes every aspect of how the system communicates.

Building this kind of system is harder than building a system that optimises for perceived confidence. It produces outputs that feel less satisfying in the short term. It requires users to develop a different relationship with AI-generated content — one based on calibrated trust rather than naive acceptance or wholesale scepticism.

But it is the only approach that addresses the problem at the level where it actually exists. Hallucination is not a medical condition. It is not a perceptual malfunction. It is the output of a system that has never been required to be honest about what it knows.


What Honest AI Communication Looks Like

It is worth being concrete about what this looks like in practice, because the description can sound more demanding than the reality.

An honest AI system, in the sense being described here, does not refuse to answer uncertain questions. It answers them — but with appropriate qualification. It says "the evidence suggests" rather than "it is the case that." It says "I believe this is approximately correct, though I'd recommend verifying with a primary source" rather than producing a confident paragraph with the same presentation it would use for a well-grounded historical fact. It says "I don't have reliable information on this specific claim" when it doesn't, rather than generating plausible-sounding text and presenting it as knowledge.

None of this requires perfect knowledge. It requires the system to track, and communicate, the difference between better-grounded and worse-grounded outputs. This is a tractable problem. It is not solved by current systems, not because it is technically impossible, but because it has not been a design priority — because the framing of hallucination as a technical bug to be reduced, rather than as a honesty problem to be addressed, has pointed development resources in the wrong direction.

The Stoics would have recognised the situation immediately. It is the familiar problem of a mind that has not disciplined its assent — that rushes to assert, compelled by the internal pressure to produce something coherent and complete, without attending to the prior question of whether what is being asserted deserves to be asserted with the confidence it is being asserted with.

The solution they proposed was not silence. It was not the refusal to engage with uncertain questions. It was the cultivation of a stable habit: the habit of marking one's epistemic state accurately, of speaking with confidence proportional to one's actual grounds for confidence, and of treating the honest acknowledgement of uncertainty as a form of intellectual integrity rather than a form of weakness.

This is what honest AI communication looks like. It is not a perfect system. It is a system that has been designed to try to be honest — and to treat honesty, including honesty about its own limitations, as something worth designing for.


Aethel is built on the premise that the honest acknowledgement of uncertainty is not a failure. It is the minimum standard of intellectual integrity. It does not tell you what it cannot verify. It marks the difference between what it knows and what it is generating. This is not a feature. It is the point of the entire exercise.