PsyPost
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
Join
My Account
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Artificial intelligence exhibits human-like cognitive errors in medical reasoning

by Karina Petrova
November 10, 2025
Reading Time: 4 mins read
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

A new study suggests that advanced artificial intelligence models, increasingly used in medicine, can exhibit human-like errors in reasoning when making clinical recommendations. The research found these AI models were susceptible to cognitive biases, and in many cases, the magnitude of these biases was even greater than that observed in practicing human doctors. The findings were published in NEJM AI.

The use of generative artificial intelligence in healthcare has expanded rapidly. These AI models, often called large language models, can draft medical histories, suggest diagnoses, and even pass medical licensing exams. They develop these abilities by processing immense quantities of text from the internet, including everything from scientific articles to popular media. The sheer volume and variety of this training data, however, means it is not always neutral or factual, and it can contain the same ingrained patterns of thought that affect human judgment.

Cognitive biases are systematic patterns of deviation from rational judgment. For example, the “framing effect” describes how the presentation of information can influence a decision. A surgical procedure described as having a 90 percent survival rate may seem more appealing than the same procedure described as having a 10 percent mortality rate, even though the outcomes are identical.

Researchers Jonathan Wang and Donald A. Redelmeier, from research institutes in Toronto, hypothesized that AI models, trained on data reflecting these human tendencies, might reproduce similar biases in their medical recommendations.

To test this idea, the researchers selected 10 well-documented cognitive biases relevant to medical decision-making. For each bias, they created a short, text-based clinical scenario, known as a vignette. Each vignette was written in two slightly different versions. While both versions contained the exact same clinical facts, one was phrased in a way designed to trigger a specific cognitive bias, while the other was phrased neutrally.

The researchers then tested two leading AI models: GPT-4, developed by OpenAI, and Gemini-1.0-Pro, from Google. They prompted the models to act as “synthetic respondents,” adopting the personas of 500 different clinicians. These personas were given a unique combination of characteristics, including medical specialty, years of experience, gender, and practice location. Each of these 500 synthetic clinicians was presented with both versions of all 10 vignettes, and the AI’s open-ended recommendations were recorded.

The results for GPT-4 showed a strong susceptibility to bias in nine of the ten scenarios tested. A particularly clear example was the framing effect. When a lung cancer surgery was described using survival statistics, 75 percent of the AI responses recommended the procedure. When the identical surgery was described using mortality statistics, only 12 percent of the responses recommended it. This 63-percentage-point difference was significantly larger than the 34-point difference observed in previous studies of human physicians presented with a similar scenario.

Another prominent bias was the “primacy effect,” where information presented first has an outsized influence. When a patient vignette began with the symptom of coughing up blood, the AI included pulmonary embolism in its potential diagnosis 100 percent of the time.

Google News Preferences Add PsyPost to your preferred sources

When the vignette began by mentioning the patient’s history of chronic obstructive pulmonary disease, pulmonary embolism was mentioned only 26 percent of the time. Hindsight bias was also extremely pronounced; a treatment was rated as inappropriate in 85 percent of cases when the patient outcome was negative, but in zero percent of cases when the outcome was positive.

In one notable exception, GPT-4 demonstrated a clear superiority to human reasoning. The model showed almost no “base-rate neglect,” a common human error of ignoring the overall prevalence of a disease when interpreting a screening test. The AI correctly calculated the probability of disease in both high-prevalence and low-prevalence scenarios with near-perfect accuracy (94 percent versus 93 percent). In contrast, prior studies show human clinicians struggle significantly with this type of statistical reasoning.

The researchers also explored whether the characteristics of the synthetic clinician personas affected the AI’s susceptibility to bias. While there were minor variations, with family physician personas showing slightly more bias and geriatrician personas slightly less, these differences were not statistically significant. No single characteristic, such as years of experience or practice location, appeared to protect the AI model from making biased recommendations.

A separate analysis was conducted using the Gemini-1.0-Pro model to see if the findings would be replicated. This model also displayed significant biases, but its patterns were different from both GPT-4 and human doctors. For example, Gemini did not exhibit the framing effect in the lung cancer scenario. In other tests, it showed biases in the opposite direction of what is typically seen in humans.

When testing for a bias related to capitulating to pressure, Gemini was less likely to order a requested test, not more. These results suggest that different AI models may have their own unique and unpredictable patterns of error.

The authors acknowledge certain limitations to their study. The AI models tested are constantly being updated, and future versions may incorporate safeguards against these biases. Detecting and correcting these ingrained reasoning flaws, however, is far more complex than filtering out obviously false or inappropriate content. The biases are often subtle and woven into the very medical literature used to train the models.

Another caveat is that the study used simulated clinical scenarios and personas, not real-world patient interactions. The research measured the frequency of biased recommendations but did not assess how these biases might translate into actual patient outcomes, costs, or other real-world impacts. The study was also limited to 10 specific cognitive pitfalls, and many other forms of bias may exist within these complex systems.

The findings suggest that simply deploying AI in medicine is not a guaranteed path to more rational decision-making. The models are not detached, purely logical agents; they are reflections of the vast and imperfect human data they were trained on. The authors propose that an awareness of these potential AI biases is a necessary first step. For these powerful tools to be used safely, clinicians will need to maintain their critical reasoning skills and learn to appraise AI-generated advice with a healthy degree of skepticism.

The study, “Cognitive Biases and Artificial Intelligence,” was authored by Jonathan Wang and Donald A. Redelmeier.

RELATED

People cannot tell AI-generated from human-written poetry and they like AI poetry more
Artificial Intelligence

A new study mapped 350,000 relationship stories and found a communication style AI struggles to copy

May 24, 2026
New study links manipulative personality traits to lower relationship intimacy expectations
Artificial Intelligence

Brain scans shed light on why women develop romantic feelings for AI companions

May 22, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
ADHD Research News

A new AI tool spots hidden signs of adult ADHD months before a formal diagnosis

May 21, 2026
Modern AI is often judged to be more human than actual humans in Turing test experiments
Artificial Intelligence

AI-generated Grokipedia articles are longer, less readable, and cite fewer sources than their Wikipedia counterparts

May 21, 2026
Modern AI is often judged to be more human than actual humans in Turing test experiments
Artificial Intelligence

Modern AI is often judged to be more human than actual humans in Turing test experiments

May 21, 2026
AI-assisted venting can boost psychological well-being, study suggests
Addiction

Artificial intelligence tools answer addiction questions accurately but lack medical nuance

May 15, 2026
Scientists trained AI to talk people out of conspiracy theories — and it worked surprisingly well
Artificial Intelligence

Real-world evidence shows generative AI is making human creative output more uniform

May 14, 2026
Blue light exposure may counteract anxiety caused by chronic vibration
Addiction

AI-designed drug reduces fentanyl consumption in animal models by targeting serotonin receptors

May 12, 2026

Follow PsyPost

The latest research, however you prefer to read it.

Daily newsletter

One email a day. The newest research, nothing else.

Google News

Get PsyPost stories in your Google News feed.

Add PsyPost to Google News
RSS feed

Use your favorite reader. We also syndicate to Apple News.

Copy RSS URL
Social media
Support independent science journalism

Ad-free reading, full archives, and weekly deep dives for members.

Become a member

Trending

  • New research shows fashion’s “plus-size” models are still smaller than the average American woman
  • What 50 years of data say about the happiness of single parents
  • Being asked to help dampens the joy of doing good, according to children in multiple countries
  • Brain development patterns predict if childhood ADHD symptoms will fade or persist
  • TikTok disproportionately served anti-Democratic videos during the 2024 election, study finds

Science of Money

  • Why people at the bottom of the ladder speed up their speech to match the boss
  • What makes a public service job attractive? A new study sorts out which perks matter most
  • What a CEO’s tweets reveal about their paycheck
  • When optimism mutes the message: How investor mood shapes crypto’s response to economic news
  • Why nominal interest rates bite harder than textbooks suggest

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc