Triple-Validated Retrieval: How We Stop the LLM From Making Things Up
Key Takeaways
- Single-layer hallucination guards miss things that the layers above and below would catch. Defence in depth is essential.
- HaluGate runs at compile time on wiki articles, extracts factual claims, cross-references each against the source material, and flags risk levels.
- Citation verification runs at serve time on every synthesised answer, dispatches each claim to a verifier model concurrently, and reports miscites in the response.
- The uncited-claim flagger handles sentences without citations: classifies each as likely-sourced-but-missed, found-in-team-sources, or genuine-knowledge-gap.
- Together, the three layers cut hallucination-driven errors to a fraction of single-layer systems and surface the gaps the system cannot answer at all.
The most common failure mode in production AI is not the model picking the wrong answer. It is the model confidently producing an answer that has no basis in the source material it was given. Hallucination at the synthesis step. Most teams catch some of this with eyeballs in PR review. Some of it slips through. The interesting question is what the system itself can catch before the eyeballs ever see it.
The pattern we run on every product is defence in depth. Three independent layers, each looking for a different failure mode, each running at a different stage of the pipeline. None of them trust the LLM. All of them ship with the answer.
Why one layer is not enough
A typical RAG hallucination guard runs once, somewhere in the pipeline. Maybe it is a contradiction detector that compares the model's answer to the retrieved chunks. Maybe it is a confidence threshold on the model's output. Either way, the guard catches a class of failure modes and misses others.
The layered approach exists because different failure modes occur at different pipeline stages:
- A claim that contradicts source material happens at synthesis time. The verifier wants to be there.
- A fabricated claim with no source at all happens at extraction time, before synthesis. The compiler wants to catch it.
- A claim the source material did contain but the citation got lost on the way through happens in the formatting layer. A separate flagger needs to find it.
Each of these wants a different check at a different stage. One guard cannot cover all three without bloating into something that cannot be reasoned about.
Layer 1: HaluGate at compile time
The brain has a wiki compiler that folds source material (session logs, meeting notes, ingested research) into curated topic articles. After every fold, an asynchronous job extracts the factual claims from the article and cross-references each claim against the source material the LLM was actually given.
If a claim has no supporting source, it gets flagged. The article carries a risk score (low, medium, high) in its frontmatter. High-risk articles trigger a notification on the team chat for review. Fabricated content does not land in the vault unchecked.
The cost of running HaluGate is roughly one extra LLM call per claim per article, paid asynchronously after the wiki compile already happened. Sub-second on most articles. The cost of not running it is fabricated wiki content that gets indexed and surfaces in retrieval months later, by which time nobody remembers where the wrong claim came from.
Layer 2: citation verification at serve time
Every answer the brain synthesises (via memory_ask and similar) carries inline citations after each claim, in the form of stable hash references back to the source chunk. Before the answer is returned to the user, a verifier dispatches each claim to a verification model concurrently and asks: is this claim supported by the cited source?
The verifier returns a per-claim verdict: supported, contradicted, or unsupported. Any claim with a non-supported verdict gets flagged in the response itself, alongside the answer.
Latency: about a second for a typical three-or-four-claim answer, run concurrently. The user gets the answer plus an honest summary: N claims verified, M claims miscited, K claims uncited.
The user sees the result. They are not asked to trust the answer, they are given the trust signal alongside it. That changes the failure mode from "AI was confidently wrong and I did not notice" to "AI flagged its own error and I caught it before acting on it".
Layer 3: the uncited-claim flagger
The third layer handles a specific failure: claims in the answer that have no citation at all. These are the most subtle cases. The model produced a sentence, did not attach a source reference, and the user has no easy way to tell whether the claim is supported or fabricated.
The flagger examines every sentence in the answer that lacks a citation and classifies it as one of three things:
- Likely sourced but citation missed. The source exists in the vault. The model just forgot to cite it. The flagger surfaces the most-likely source so the user can verify quickly.
- Found in team sources. The answer is right but came from a different scope (e.g. team brain rather than personal brain). The flagger suggests the right scope to query next time.
- Genuine knowledge gap. No support in the vault at all. The flagger says: I do not have a source for this. Here is where you might want to record it.
The third category is the most interesting one. The system does not pretend to know. It actively asks to be taught. Over time, those gap notifications turn into vault content, and the brain's coverage grows where it was actually thin.
Why all three matter together
Skip layer 1, and fabricated claims land in the wiki and feed back into future retrievals as if they were trusted source material. The poison spreads.
Skip layer 2, and the user sees a confident answer with no honest signal. Wrong answers feel as solid as right ones.
Skip layer 3, and the system pretends to know things it does not, instead of saying so. The team never sees where coverage is missing.
Each layer is cheap on its own. Each catches a different failure mode. None of them require a frontier model. The aggregate effect is a retrieval layer the team can actually rely on, with honest signals attached to every answer.
Take the Next Step
If your team has been losing trust in an AI memory layer because of subtle hallucinations the team cannot quite pin down, the fix is rarely a better model. The fix is usually layered guards. We help teams instrument HaluGate, citation verification, and the uncited-claim flagger as three small primitives that cover the bulk of real failure modes. Get in touch if you want a fresh pair of eyes on yours.