Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

ChatGPT-created letters of recommendation are nearly indistinguishable from human-authored letters, study finds

by Eric W. Dolan
December 10, 2023
in Artificial Intelligence
(Photo credit: Adobe Stock)

(Photo credit: Adobe Stock)

Share on TwitterShare on Facebook

In a new study published in the journal AEM Education and Training, researchers discovered that academic physicians could only slightly better than guesswork differentiate between recommendation letters written by humans and those generated by artificial intelligence (AI). The study raises critical questions about the future role of AI in academic assessments, the need for ethical considerations in its use, and the potential reevaluation of the current practices in recommendation letters.

Letters of recommendation are a staple in the academic world, particularly in medicine. They play a critical role in various decisions, from student admissions to faculty promotions. However, writing these letters is often a burdensome task for busy academics. With the rise of AI technologies like ChatGPT, a tool adept at generating human-like text, the possibility emerged: Could AI assist in this labor-intensive process?

“This topic interested us as we recognized the essential yet time-consuming role of letters of recommendation (LORs) in academic medicine,” explained study author Carl Preiksaitis, a clinical instructor at the Department of Emergency Medicine at Stanford University School of Medicine. “These letters are written for a variety of different scenarios, from application to medical school and residency to faculty promotion. We had heard anecdotal evidence that generative AI models, such as ChatGPT, were being used to aid in authoring LORs and we wanted to explore this possibility in a more rigorous way.”

To conduct the study, the researchers selected four hypothetical candidates for academic promotion. They prepared detailed profiles for these candidates, covering their educational background, employment history, and accolades, but without any gender identification to avoid bias.

Next, the team crafted letters of recommendation. Two experienced members wrote letters as they usually would, serving as the ‘human’ authors. Meanwhile, two junior team members, with no prior experience in such letter writing, used ChatGPT to create AI-authored letters. The AI-generated letters were based on prompts derived from the candidates’ achievements. To maintain consistency, all letters were formatted similarly, focusing solely on content differences.

The researchers then designed a survey, which was administered to 32 participants, primarily full professors in the fields of emergency medicine, internal medicine, and family medicine. These participants were randomly given eight out of 16 letters (half AI-authored, half human-authored) to review. They were asked to guess the authorship of each letter, rate its quality, and assess its persuasiveness regarding the candidate’s promotion.

On average, participants correctly identified the authorship only 59.4% of the time, barely above a random guess. Interestingly, even those with extensive experience in reviewing letters did not fare much better. When it came to the perceived quality and persuasiveness of the letters, there was a bias: reviewers rated letters they believed were human-written higher than those they thought were AI-generated. However, when the actual source of the letters was considered, this difference in perception disappeared.

“One surprising element was the overall difficulty participants had in distinguishing between human- and AI-authored LORs, with accuracy only slightly better than chance,” Preiksaitis said. “Additionally, the study revealed a discrepancy in the perceived quality and persuasiveness of LORs based on the suspected authorship, with human-suspected LORs rated more favorably, despite the actual authorship.”

Google News Preferences Add PsyPost to your preferred sources

The study also examined gender bias in the letters. Results showed human-written letters contained more female-associated words, while AI-generated letters tended to have more male-associated words. Additionally, AI detection tools like GPTZero and OpenAI’s Text Classifier showed mixed effectiveness, each correctly identifying the authorship of the letters only half of the time.

The findings are in line with a previous study published in Research Methods in Applied Linguistics. In that study, 72 linguistics experts were tested to see if they could differentiate between research abstracts written by AI and humans. Despite the experts’ efforts to use linguistic and stylistic analyses, their success rate was only 38.9%, indicating a significant challenge in distinguishing AI writing from human writing.

“The average person should understand that AI technologies like ChatGPT have reached a level of sophistication where they can generate text, such as LORs, that is nearly indistinguishable from human-authored content,” Preiksaitis told PsyPost. “This suggests that AI might be a viable tool to reduce the administrative workload in academic settings. However, it also raises questions about the integrity and personalization of such important documents. The study highlights the potential for AI to assist in academic writing while also signaling the need for careful consideration of its implications.”

Despite these intriguing results, the study is not without its limitations. The standardized format of the data used in letter creation might not reflect the more personalized and nuanced letters in real-world scenarios. Also, the recruitment strategy could lead to biased results, with an overrepresentation of male participants and those in emergency medicine. Moreover, the study did not delve deeply into why and how reviewers made their distinctions between human- and AI-authored letters.

Future research could explore these areas further, perhaps focusing on how to enhance AI’s ability to write more personalized and unbiased letters. Additionally, as AI continues to advance, it’s essential to consider the ethical implications and the need for transparency in its usage, especially in critical areas like academic evaluations.

“A key caveat is the standardized approach used to generate the LORs, which might not reflect the personalized and nuanced understanding a human writer has of the candidate,” Preiksaitis noted. “The overrepresentation of certain demographics in the participant pool and the potential bias in their responses also could limit the generalizability of our findings. Future research should explore how AI-generated LORs might be optimized for authenticity and how biases, both human and AI, can be mitigated. Additionally, the ethical implications of AI assistance in such tasks need thorough exploration.”

“Perhaps most provocatively, this research and the increasing ability of generative AI causes us to question the utility of practices from a pre-AI era, like LOR<‘ the researcher added. “Perhaps we can use this crossroads as an opportunity to develop a different way of recommending candidates that is more equitable and transparent.”

The study, “Brain versus bot: Distinguishing letters of recommendation authored by humans compared with artificial intelligence“, was authored by Carl Preiksaitis, Christopher Nash, Michael Gottlieb, Teresa M. Chan, Al’ai Alvarez, and Adaira Landry.

Previous Post

Pole dancing classes boost women’s mental wellbeing, study finds

Next Post

Yoga-based interventions might improve anxiety and depression symptoms

RELATED

Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

People remain “blissfully ignorant” of AI use in everyday messages, new research shows

April 20, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

Disclosing autism to AI chatbots prompts overly cautious, stereotypical advice

April 18, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

Scientists tested the creativity of AI models, and the results were surprisingly homogeneous

April 18, 2026
People ascribe intentions and emotions to both human- and AI-made art, but still report stronger emotions for artworks made by humans
Artificial Intelligence

New research links personality traits to confidence in recognizing artificial intelligence deception

April 13, 2026
Scientists just found a novel way to uncover AI biases — and the results are unexpected
Artificial Intelligence

Artificial intelligence makes consumers more impatient

April 11, 2026
Scientists identify a fat-derived hormone that drives the mood benefits of exercise
Artificial Intelligence

People consistently devalue creative writing generated by artificial intelligence

April 5, 2026
People cannot tell AI-generated from human-written poetry and they like AI poetry more
Artificial Intelligence

Job seekers mask their emotions and act more analytical when evaluated by artificial intelligence

April 3, 2026
AI autocomplete suggestions covertly change how users think about important topics
Artificial Intelligence

AI autocomplete suggestions covertly change how users think about important topics

April 2, 2026

STAY CONNECTED

RSS Psychology of Selling

  • Why personalized ads sometimes backfire: A research review explains when tailoring messages works and when it doesn’t
  • The common advice to avoid high customer expectations may not be backed by evidence
  • Personality-matched persuasion works better, but mismatched messages can backfire
  • When happy customers and happy employees don’t add up: How investor signals have shifted in the social media age
  • Correcting fake news about brands does not backfire, five-study experiment finds

LATEST

Listening to bad music makes you crave sugar, study finds

People remain “blissfully ignorant” of AI use in everyday messages, new research shows

Believing in a “chemical imbalance” might keep patients on antidepressants longer

Can a common parasite medication calm the brain’s stress circuitry during alcohol withdrawal?

Childhood trauma and attachment styles show nuanced links to alternative sexual preferences

New study reveals how political bias conditions the impact of conspiracy thinking

Cognition might emerge from embodied “grip” with the world rather than abstract mental processes

Men and women show different relative cognitive strengths across their lifespans

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc