Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Startling study finds people overtrust AI-generated medical advice

by Vladimir Hedrih
October 10, 2025
in Artificial Intelligence
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook
Stay informed on the latest psychology and neuroscience research—follow PsyPost on LinkedIn for daily updates and insights.

A new study involved participants evaluating medical responses that were either written by a medical doctor or generated by a large language model. Results showed participants could not distinguish between doctors’ responses and AI-generated responses, but preferred AI-generated ones. They found high-accuracy AI responses to be the best, but rated low-accuracy AI responses and those given by a medical doctor similarly. The paper was published in NEJM AI.

The use of artificial intelligence (AI) systems in the field of medicine and health care has increased dramatically in recent years. This increase has occurred across various domains, from radiology imaging and mental health chatbots to drug discovery.

One particular application of AI, especially large language models (LLMs), in the medical field is answering patients’ questions. One study showed that ChatGPT was able to generate higher quality and more empathetic responses to patient questions compared to those from medical doctors. AI also seems to excel in diagnostics. One study found that AI alone outperformed physicians in making diagnoses, while a follow-up showed that physicians augmented with AI performed comparably to AI alone, and both groups outperformed physicians working without AI.

Study author Shruthi Shekar and her colleagues wanted to investigate how well people distinguish between responses to patients’ questions given by medical doctors and those generated by AI. Participants were also asked to rate the validity, trustworthiness, completeness, and other aspects of the answers.

The researchers retrieved 150 anonymous medical questions and doctors’ responses from the forum HealthTap. These questions covered six domains of medicine: preventative and risk factors; conditions and symptoms; diagnostics and tests; procedures and surgeries; medication and treatments; and recovery and wellness, with equal distribution.

The researchers then used GPT-3 to create AI responses for each of those questions. These AI-generated responses were then evaluated by four physicians to establish their accuracy. This process was used to classify AI responses into high- and low-accuracy ones.

Next, in the first experiment, a group of 100 online study participants were presented with 10 medical question-response pairs randomly selected from a collection of 30 high-accuracy AI responses, 30 low-accuracy AI responses and 30 doctors’ responses.

In the second experiment, 100 online participants rated their understanding of the question and the response and its perceived validity. They also rated the trustworthiness of the response, its completeness and their satisfaction with it, whether they would search for additional information based on the response, whether they would follow the given advice, and whether they would seek subsequent medical attention based on the response.

In the third experiment, 100 online participants provided the same ratings for the responses, but participants were randomly informed that the responses were from either a doctor, an AI, or a doctor assisted by an AI.

Results showed that participants were unable to effectively distinguish between AI-generated responses and doctors’ responses. However, they showed a preference for AI-generated responses, rating high-accuracy AI-generated responses as significantly more valid, trustworthy, and complete than the other two types of responses. Low-accuracy AI responses tended to receive ratings similar to those given to doctors’ responses.

Interestingly, participants not only found the low-accuracy AI responses to be as trustworthy as those given by doctors, they also reported a high tendency to follow the potentially harmful medical advice contained in those responses and to incorrectly seek unnecessary medical attention as a result of the response. These problematic reactions were comparable with the reactions they displayed toward doctors’ responses, and sometimes even stronger. The study authors note that both experts (raters) and nonexperts (participants) tended to find AI-generated responses to be more thorough and accurate than doctors’ responses, but they still valued the involvement of a doctor in the delivery of their medical advice.

“The increased trust placed in inaccurate or inappropriate AI-generated medical advice can lead to misdiagnosis and harmful consequences for individuals seeking help. Further, participants were more trusting of high-accuracy AI-generated responses when told they were given by a doctor, and experts rated AI-generated responses significantly higher when the source of the response was unknown,” the study authors concluded.

The study sheds light on how humans perceive medical advice generated by AI systems. However, it should be noted that the questions and responses used in the study were taken from an online forum, where medical doctors tend to contribute their content voluntarily. It is likely that the answers were given with the aim of being useful and not with the aim of being the best or the most thorough answers the medical doctors providing them could give. The results of studies comparing answers of doctors clearly aiming to provide their best answers with AI-generated content might not be identical.

The paper, “People Overtrust AI-Generated Medical Advice despite Low Accuracy,” was authored by Shruthi Shekar, Pat Pataranutaporn, Chethan Sarabu, Guillermo A. Cecchi, and Pattie Maes.

RELATED

Liberals prefer brands that give employees more freedom, study finds
Artificial Intelligence

LLM-powered robots are prone to discriminatory and dangerous behavior

November 15, 2025
People who signal victimhood are seen as having more manipulative traits
Artificial Intelligence

Grok’s views mirror other top AI models despite “anti-woke” branding

November 14, 2025
ChatGPT’s social trait judgments align with human impressions, study finds
Artificial Intelligence

ChatGPT’s social trait judgments align with human impressions, study finds

November 13, 2025
From tango to StarCraft: Creative activities linked to slower brain aging, according to new neuroscience research
Artificial Intelligence

New study finds users are marrying and having virtual children with AI chatbots

November 11, 2025
AI-assisted venting can boost psychological well-being, study suggests
Artificial Intelligence

Study finds users disclose more to AI chatbots introduced as human

November 11, 2025
Mind captioning: This scientist just used AI to translate brain activity into text
Artificial Intelligence

Artificial intelligence exhibits human-like cognitive errors in medical reasoning

November 10, 2025
Mind captioning: This scientist just used AI to translate brain activity into text
Artificial Intelligence

Mind captioning: This scientist just used AI to translate brain activity into text

November 10, 2025
Shyness linked to spontaneous activity in the brain’s cerebellum
Artificial Intelligence

AI roots out three key predictors of terrorism support

November 6, 2025

PsyPost Merch

STAY CONNECTED

LATEST

Specific parental traits are linked to distinct cognitive skills in gifted children

The rhythm of your speech may offer clues to your cognitive health

A simple writing exercise shows promise for reducing anxiety

Different types of narcissism are linked to distinct sexual fantasies

Why are some people less outraged by corporate misdeeds?

Study uncovers distinct genetic blueprints for early- and late-onset depression

How you view time may influence depression by shaping your sleep rhythm

ADHD is linked to early and stable differences in brain’s limbic system

RSS Psychology of Selling

  • What the neuroscience of Rock-Paper-Scissors reveals about winning and losing
  • Rethink your global strategy: Research reveals when to lead with the heart or the head
  • What five studies reveal about Black Friday misbehavior
  • How personal happiness shapes workplace flourishing among retail salespeople
  • Are sales won by skill or flexibility? A look inside investment banking sales strategies
         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy