PsyPost
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
Join
My Account
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Study finds nearly two-thirds of AI-generated citations are fabricated or contain errors

by Karina Petrova
November 20, 2025
Reading Time: 4 mins read
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

A new investigation into the reliability of advanced artificial intelligence models highlights a significant risk for scientific research. The study, published in JMIR Mental Health, found that large language models like OpenAI’s GPT-4o frequently generate fabricated or inaccurate bibliographic citations, with these errors becoming more common when the AI is prompted on less familiar or highly specialized topics.

Researchers are increasingly turning to tools known as large language models, or LLMs, to help manage demanding workloads. These complex AI systems are trained on immense quantities of text from the internet and licensed databases, enabling them to produce human-like text for tasks like summarizing articles, drafting emails, or writing code.

One of the known limitations of these models is a tendency to produce “hallucinations,” which are confident-sounding statements that are factually incorrect or entirely made up. In academic writing, a particularly problematic form of this is the fabrication of scientific citations, which are the bedrock of scholarly communication.

While past studies have documented that LLMs can invent citations, it has been less clear how the nature of a given topic might influence the frequency of these errors. A team of researchers from the School of Psychology at Deakin University in Australia sought to explore this question within the field of mental health.

They designed an experiment to test whether the AI’s performance would change based on a topic’s public visibility and the depth of its existing scientific literature. The team’s objective was to determine if citation fabrication and accuracy rates in GPT-4o’s output systematically varied depending on the subject matter.

To conduct their study, the researchers prompted GPT-4o, a recent model from OpenAI, to generate six different literature reviews. These reviews centered on three mental health conditions chosen for their varying levels of public recognition and research coverage: major depressive disorder (a widely known and heavily researched condition), binge eating disorder (moderately known), and body dysmorphic disorder (a less-known condition with a smaller body of research). This selection allowed for a direct comparison of the AI’s performance on topics with different amounts of available information in its training data.

For each of the three disorders, the team requested two types of reviews. One prompt asked for a general overview covering symptoms, societal impacts, and treatments. The other prompt requested a specialized review focused on a narrower subject: the evidence for digital health interventions. The researchers instructed the AI to produce reviews of about 2000 words and to include at least 20 citations from peer-reviewed academic sources.

After generating the reviews, the researchers methodically extracted all 176 citations provided by the AI. Each reference was painstakingly verified using multiple academic databases, including Google Scholar, Scopus, and PubMed. Citations were sorted into one of three categories: fabricated (the source did not exist), real with errors (the source existed but had incorrect details like the wrong year, volume number, or author list), or fully accurate. The team then analyzed the rates of fabrication and accuracy across the different disorders and review types.

Google News Preferences Add PsyPost to your preferred sources

The analysis showed that across all six reviews, nearly one-fifth of the citations, 35 out of 176, were entirely fabricated. Of the 141 citations that corresponded to real publications, almost half contained at least one error, such as an incorrect digital object identifier, which is a unique code used to locate a specific article online. In total, nearly two-thirds of the references generated by the model were either invented or contained bibliographic mistakes.

The rate of citation fabrication was strongly linked to the topic. For major depressive disorder, the most well-researched condition, only 6 percent of citations were fabricated. In contrast, the fabrication rate rose sharply to 28 percent for binge eating disorder and 29 percent for body dysmorphic disorder. This suggests the AI is less reliable when generating references for subjects that are less prominent in its training data.

The specificity of the prompt also had an effect, particularly for less common topics. When asked to write about binge eating disorder, the specialized review on digital interventions had a much higher fabrication rate (46 percent) compared to the general overview (17 percent).

A similar pattern appeared in the accuracy of real citations. For major depressive disorder, the general review was significantly more accurate than the specialized one. Accuracy rates were also lowest overall for body dysmorphic disorder, where only 29 percent of real citations were free of errors.

The study has some limitations that the authors acknowledge. The findings are specific to one AI model, GPT-4o, and may not be representative of others. The experiment was also confined to three specific mental health topics and used straightforward prompts that did not involve advanced techniques to guide the AI’s output. Repeating the same prompt can also produce different results, and the team analyzed only a single output for each one.

Future research could examine a wider range of topics and AI models to see if these patterns hold. Still, the study’s results have clear implications for the academic community. Researchers using these models are advised to exercise caution and perform rigorous human verification of every reference an AI generates. The findings also suggest that academic journals and institutions may need to develop new standards and tools to safeguard the integrity of published research in an era of AI-assisted writing.

The study, “Influence of Topic Familiarity and Prompt Specificity on Citation Fabrication in Mental Health Research Using Large Language Models: Experimental Study,” was authored by Jake Linardon, Hannah K Jarman, Zoe McClure, Cleo Anderson, Claudia Liu, and Mariel Messer.

RELATED

Study links phubbing sensitivity to attachment patterns in romantic couples
Artificial Intelligence

Training AI chatbots to be warm and empathetic makes them less factually accurate

May 29, 2026
New Habsburg research reveals reproductive consequences of royal inbreeding
Artificial Intelligence

Machine learning uncovers how childhood trauma amplifies genetic risks for depression

May 27, 2026
People cannot tell AI-generated from human-written poetry and they like AI poetry more
Artificial Intelligence

A new study mapped 350,000 relationship stories and found a communication style AI struggles to copy

May 24, 2026
New study links manipulative personality traits to lower relationship intimacy expectations
Artificial Intelligence

Brain scans shed light on why women develop romantic feelings for AI companions

May 22, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
ADHD Research News

A new AI tool spots hidden signs of adult ADHD months before a formal diagnosis

May 21, 2026
Modern AI is often judged to be more human than actual humans in Turing test experiments
Artificial Intelligence

AI-generated Grokipedia articles are longer, less readable, and cite fewer sources than their Wikipedia counterparts

May 21, 2026
Modern AI is often judged to be more human than actual humans in Turing test experiments
Artificial Intelligence

Modern AI is often judged to be more human than actual humans in Turing test experiments

May 21, 2026
AI-assisted venting can boost psychological well-being, study suggests
Addiction

Artificial intelligence tools answer addiction questions accurately but lack medical nuance

May 15, 2026

Follow PsyPost

The latest research, however you prefer to read it.

Daily newsletter

One email a day. The newest research, nothing else.

Google News

Get PsyPost stories in your Google News feed.

Add PsyPost to Google News
RSS feed

Use your favorite reader. We also syndicate to Apple News.

Copy RSS URL
Social media
Support independent science journalism

Ad-free reading, full archives, and weekly deep dives for members.

Become a member

Trending

  • The psychology of paradoxical thinking: Extreme arguments in favor of a controversial topic can reduce overall support
  • Men’s sexual desire peaks around age 40, large new study finds
  • Scientists say the hidden “third eye” inside your skull is the bizarre reason you can see
  • The cognitive difference between amateur and expert chess players
  • Voters use left and right political labels as mental shortcuts, not strict policy matches

Science of Money

  • Childhood obesity and the American Dream: New research links early weight to lower lifetime mobility
  • The brain chemical behind your money moves: How dopamine shapes financial choices
  • Can AI read the room? How news sentiment signals which stocks will bounce back after a crash
  • New study finds private financial firms disproportionately promote upper-class white men
  • Why people at the bottom of the ladder speed up their speech to match the boss

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc