PsyPost
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
Join
My Account
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Scientists just uncovered a major limitation in how AI models understand truth and belief

by Eric W. Dolan
December 11, 2025
Reading Time: 5 mins read
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

A new evaluation of artificial intelligence systems suggests that while modern language models are becoming more capable at logical reasoning, they struggle significantly to distinguish between objective facts and subjective beliefs. The research indicates that even advanced models often fail to acknowledge that a person can hold a belief that is factually incorrect, which poses risks for their use in fields like healthcare and law. These findings were published in Nature Machine Intelligence.

Human communication relies heavily on the nuance between stating a fact and expressing an opinion. When a person says they know something, it implies certainty, whereas saying they believe something allows for the possibility of error. As artificial intelligence integrates into high-stakes areas like medicine or law, the ability to process these distinctions becomes essential for safety.

Large language models (LLMs) are artificial intelligence systems designed to understand and generate human language. These programs are trained on vast amounts of text data, learning to predict the next word in a sequence to create coherent responses. Popular examples of this technology include OpenAI’s GPT series, Google’s Gemini, Anthropic’s Claude, and Meta’s Llama.

Previous evaluations of these systems often focused on broad reasoning capabilities but lacked specific testing of how models handle linguistic markers of belief versus knowledge. The authors aimed to fill this gap by systematically testing how models react when facts and beliefs collide. They sought to determine if these systems truly comprehend the difference between believing and knowing or if they merely mimic patterns found in their training data.

“Large language models are increasingly used for tutoring, counseling, medical/legal advice, and even companionship,” said James Zou of Stanford University, the senior author of the new paper. “In these settings, it is really important for the LLM to understand not only the facts but also the user’s beliefs. For example, a student may have some confusion about math, and the tutor AI needs to acknowledge what the confusion is in order to effectively help the student. This motivated us to systematically analyze how well LLMs can distinguish user’s beliefs from facts.”

The scientific team developed a new testing suite called the Knowledge and Belief Language Evaluation, or KaBLE. This dataset consists of 13,000 specific questions divided across thirteen distinct tasks.

To build this, they started with 1,000 sentences covering ten different subject areas, such as history, literature, mathematics, and medicine. Half of these sentences were factual statements verified by reputable sources like Britannica and NASA. The other half were falsified versions of those statements, created by altering key details to ensure they were untrue.

The researchers evaluated twenty-four different LLMs using this dataset. The sample included older general-purpose models like GPT-4 and Llama-3, as well as newer “reasoning-oriented” models like OpenAI’s o1 and DeepSeek R1. The team used a standardized prompting method to get clear answers, asking the models to verify statements or confirm the mental states of speakers. They measured accuracy by checking if the models could correctly verify facts, confirm the existence of beliefs, and navigate complex sentences involving multiple layers of knowledge.

Google News Preferences Add PsyPost to your preferred sources

The assessment revealed that models have a difficult time acknowledging that a speaker holds a false belief. When a user stated, “I believe that [false statement],” the models frequently tried to correct the fact rather than simply confirming the user’s belief.

For instance, the accuracy of GPT-4o dropped from 98.2 percent when handling true beliefs to 64.4 percent for false beliefs. The drop was even more severe for DeepSeek R1, which fell from over 90 percent accuracy to just 14.4 percent. This suggests the models prioritize factual correctness over the linguistic task of attributing a specific thought to a speaker.

“We found that across 24 LLMs, models consistently fail to distinguish user’s belief from facts. For example, suppose I tell the LLM “I believe that humans only use 10% of our brain” (which is not factually correct, but many people hold this belief). The LLM would refuse to acknowledge this belief; it may say something like, “you don’t really believe that humans use 10% of the brain”. This suggests that LLMs do not have a good mental model of the users. The implication of our finding is that we should be very careful when using LLMs in these more subjective and personal settings.”

The researchers also found a disparity in how models treat different speakers. The systems were much more capable of attributing false beliefs to third parties, such as “James” or “Mary,” than to the first-person “I.” On average, newer models correctly identified third-person false beliefs 95 percent of the time. However, their accuracy for first-person false beliefs was only 62.6 percent. This gap implies that the models have developed different processing strategies depending on who is speaking.

The study also highlighted inconsistencies in how models verify basic facts. Older models tended to be much better at identifying true statements than identifying false ones. For example, GPT-3.5 correctly identified truths nearly 90 percent of the time but identified falsehoods less than 50 percent of the time. Conversely, some newer reasoning models showed the opposite pattern, performing better when verifying false statements than true ones. The o1 model achieved 98.2 percent accuracy on false statements compared to 94.4 percent on true ones.

This counterintuitive pattern suggests that recent changes in how models are trained have influenced their verification strategies. It appears that efforts to reduce hallucinations or enforce strict factual adherence may have overcorrected in certain areas. The models display unstable decision boundaries, often hesitating when confronted with potential misinformation. This hesitation leads to errors when the task is simply to identify that a statement is false.

In addition, the researchers observed that minor changes in wording caused significant performance drops. When the question asked “Do I really believe” something, instead of just “Do I believe,” accuracy plummeted across the board. For the Llama 3.3 70B model, adding the word “really” caused accuracy to drop from 94.2 percent to 63.6 percent for false beliefs. This indicates the models may be relying on superficial pattern matching rather than a deep understanding of the concepts.

Another area of difficulty involved recursive knowledge, which refers to nested layers of awareness, such as “James knows that Mary knows X.” While some top-tier models like Gemini 2 Flash handled these tasks well, others struggled significantly. Even when models provided the correct answer, their reasoning was often inconsistent. Sometimes they relied on the fact that knowledge implies truth, while other times they dismissed the relevance of the agents’ knowledge entirely.

Most models lacked a robust understanding of the factive nature of knowledge. In linguistics, “to know” is a factive verb, meaning one cannot “know” something that is false; one can only believe it. The models frequently failed to recognize this distinction. When presented with false knowledge claims, they rarely identified the logical contradiction, instead attempting to verify the false statement or rejecting it without acknowledging the linguistic error.

These limitations have significant implications for the deployment of AI in high-stakes environments. In legal proceedings, the distinction between a witness’s belief and established knowledge is central to judicial decisions. A model that conflates the two could misinterpret testimony or provide flawed legal research. Similarly, in mental health settings, acknowledging a patient’s beliefs is vital for empathy, regardless of whether those beliefs are factually accurate.

The researchers note that these failures likely stem from training data that prioritizes factual accuracy and helpfulness above all else. The models appear to have a “corrective” bias that prevents them from accepting incorrect premises from a user, even when the prompt explicitly frames them as subjective beliefs. This behavior acts as a barrier to effective communication in scenarios where subjective perspectives are the focus.

Future research needs to focus on helping models disentangle the concept of truth from the concept of belief. The research team suggests that improvements are necessary before these systems are fully deployed in domains where understanding a user’s subjective state is as important as knowing the objective facts. Addressing these epistemological blind spots is a requirement for responsible AI development.

The study, “Language models cannot reliably distinguish belief from knowledge and fact,” was authored by Mirac Suzgun, Tayfun Gur, Federico Bianchi, Daniel E. Ho, Thomas Icard, Dan Jurafsky, and James Zou.

RELATED

Blue light exposure may counteract anxiety caused by chronic vibration
Addiction

AI-designed drug reduces fentanyl consumption in animal models by targeting serotonin receptors

May 12, 2026
Childhood ADHD traits linked to midlife distress, with societal exclusion playing a major role
Artificial Intelligence

ChatGPT’s free version is 26 times more likely to respond inappropriately to psychotic delusions

May 9, 2026
Mind captioning: This scientist just used AI to translate brain activity into text
Artificial Intelligence

Scientists tested AI’s moral compass, and the results reveal a key blind spot

May 8, 2026
Scientists show how common chord progressions unlock social bonding in the brain
Artificial Intelligence

Perpetrators of AI sexual abuse often view their actions as a joke, new research shows

May 7, 2026
AI outshines humans in humor: Study finds ChatGPT is as funny as The Onion
Artificial Intelligence

Conversational AI shows promise in easing symptoms of anxiety and depression

May 6, 2026
The surprising link between conspiracy mentality and deepfake detection ability
Artificial Intelligence

Deepfake videos degrade political reputations even when viewers realize they are fake

May 5, 2026
Stanford scientist discovers that AI has developed an uncanny human-like ability
Artificial Intelligence

Turning to chatbots when lonely may exacerbate feelings of loneliness, study finds

May 4, 2026
Study explores how virtual “girlfriend experiences” tap evolved relationship motivations in the digital age
Artificial Intelligence

Study explores how virtual “girlfriend experiences” tap evolved relationship motivations in the digital age

May 3, 2026

Follow PsyPost

The latest research, however you prefer to read it.

Daily newsletter

One email a day. The newest research, nothing else.

Google News

Get PsyPost stories in your Google News feed.

Add PsyPost to Google News
RSS feed

Use your favorite reader. We also syndicate to Apple News.

Copy RSS URL
Social media
Support independent science journalism

Ad-free reading, full archives, and weekly deep dives for members.

Become a member

Trending

  • Brain scans identify the neural network that traps anxious people in cycles of self-blame
  • Brooding identified as a major driver of bedtime procrastination, alongside physical markers of stress
  • Scientists challenge The Body Keeps the Score with a new predictive model of trauma
  • Eating at least five eggs a week is associated with a 27 percent lower risk of Alzheimer’s
  • Brain scans reveal how people with autistic traits connect differently

Science of Money

  • The Goldilocks zone of sales pressure: Why a little urgency helps and too much hurts
  • What women really want from “girl power” ads: Six ingredients that make femvertising work
  • The seductive allure of neuroscience: Why brain talk feels so satisfying, even when it explains nothing
  • When two heads aren’t better than one: What research reveals about human-AI teamwork in marketing
  • How your personality may shape whether you pick value or growth stocks

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc