Skip to content

When AI Cites What Doesn't Exist ​

A Five-Stage Investigation into a Hallucination That Refused to Die ​

A real paper, a real framework name, a real method — assembled into a claim that was plausible, internally consistent, and false. Four out of five checks passed.

We were reviewing prior art for a patent filing. The AI cited a paper: "GoodRec — a heat kernel diffusion framework for knowledge graph representation" by Chen et al.

The claim mattered. If a published paper already described heat kernel diffusion on graphs for knowledge representation, our patent filing was at risk. We had to verify it.

This is the story of a hallucination that passed every simple check we threw at it.

Stage 1: The paper exists ​

We searched for "GoodRec Chen." Google Scholar returned a hit. A real paper, by a real author named Chen, at a real conference. Our first instinct — call it verified, move on — would have been wrong.

The paper was: "Graph-Oriented Cross-Modality Diffusion for Multimedia Recommendation" (Chen et al., ADMA 2025). Published. Peer-reviewed. DOI: 10.1007/978-981-95-3453-1_17.

A binary "does this paper exist?" check returns yes.

Stage 2: The framework name exists ​

Inside that paper, the authors call their recommendation framework "GoodRec." It appears in the title of their system, in their experiments, in their ablation studies.

A search for "GoodRec" inside academic databases returns a match.

Stage 3: The content doesn't match ​

Here is where a human reader catches what automated checks miss.

The paper is about multimedia recommendation — combining user interaction data with visual and textual features to suggest products. It uses a diffusion mechanism, yes. But it has nothing to do with heat kernel theory on knowledge graphs. The word "knowledge graph" does not appear in the abstract. The mathematical framework is cross-modal attention, not Riemannian geometry.

The AI took three real ingredients — a paper, a framework name, and a diffusion method — and recombined them into a claim that was plausible, internally consistent, and false.

Stage 4: The hallucination is harder to find than a lie ​

A fabricated paper is easy to detect. Search for the title, get zero results, flag it. Takes seconds.

A recombination hallucination is the opposite. Every component checks out individually:

  • Paper exists? Yes.
  • Author named Chen? Yes.
  • Framework called GoodRec? Yes.
  • Uses diffusion? Yes.
  • About heat kernels on knowledge graphs? No.

Four out of five checks pass. The fifth requires actually reading the paper and understanding the mathematical difference between cross-modal diffusion and heat kernel diffusion on manifolds. No citation verification tool catches this. No API can check it. It requires domain knowledge.

Stage 5: Why this matters for persistent memory ​

This was a one-time verification task. We caught it because a human read the paper. But what happens when the system stores thousands of claims per day, and no human reads every source?

In a persistent memory system, a false claim that enters the store stays there. It gets retrieved when relevant queries arrive. Other memories link to it through co-activation. Over time, the false claim becomes embedded in the knowledge graph — not as an isolated error, but as a node connected to legitimate knowledge.

The cost of catching a hallucination grows with how long it has been in memory:

When caughtWhat it costs
Before storageOne correction, minutes
After storage, before retrievalDelete one record
After repeated retrievalAudit every decision that used it
After graph propagationTrace contamination across connected memories

This is not a theoretical concern. During our development, we found that hallucinated citations could survive for weeks in a working knowledge base before anyone noticed — and by then, downstream analyses had incorporated them.

What we learned ​

Three lessons came out of this investigation.

First: binary verification is insufficient. "Does this exist?" is the wrong question. The right question is: "Does the cited source support the specific claim being made?" That requires understanding both the claim and the source, which is a fundamentally harder problem.

Second: recombination hallucinations are the hard case. As language models improve, they will produce fewer outright fabrications and more plausible recombinations. The ingredients will be real. The assembly will be false. Detection difficulty goes up, not down, as models get better.

Third: persistent memory needs verification at the gate. If you store everything and verify later, you are already behind. The most effective intervention point is before a claim enters memory — not after it has been retrieved fifty times and reinforced through co-activation.

The broader problem ​

This case study is one example. The underlying challenge applies to every system that stores AI-generated knowledge for later use:

  • Research assistants that accumulate findings across sessions
  • Enterprise knowledge bases populated by LLM agents
  • Personal AI assistants that remember past conversations
  • Autonomous agents that build world models from observation

Each of these systems faces the same question: how do you know what you know is true?

We don't have a complete answer. Citation verification handles the easy cases. For the hard cases — recombination, context drift, subtle misattribution — the field needs new approaches. We are working on several, and publishing results as they mature.

The one thing we are certain of: the problem gets worse with scale, not better. Every memory system needs a verification strategy, and "trust the LLM" is not one.


The GoodRec paper is real: Chen, J. et al., "Graph-Oriented Cross-Modality Diffusion for Multimedia Recommendation," ADMA 2025, Lecture Notes in Artificial Intelligence (Springer). DOI: 10.1007/978-981-95-3453-1_17. It is a solid piece of work on recommendation systems. It has nothing to do with heat kernels on knowledge graphs.


Eduard Izgorodin, April 2026 — LinkedIn