PsyPost
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
Join
My Account
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

ChatGPT-created letters of recommendation are nearly indistinguishable from human-authored letters, study finds

by Eric W. Dolan
December 10, 2023
Reading Time: 4 mins read
(Photo credit: Adobe Stock)

(Photo credit: Adobe Stock)

Share on TwitterShare on Facebook

In a new study published in the journal AEM Education and Training, researchers discovered that academic physicians could only slightly better than guesswork differentiate between recommendation letters written by humans and those generated by artificial intelligence (AI). The study raises critical questions about the future role of AI in academic assessments, the need for ethical considerations in its use, and the potential reevaluation of the current practices in recommendation letters.

Letters of recommendation are a staple in the academic world, particularly in medicine. They play a critical role in various decisions, from student admissions to faculty promotions. However, writing these letters is often a burdensome task for busy academics. With the rise of AI technologies like ChatGPT, a tool adept at generating human-like text, the possibility emerged: Could AI assist in this labor-intensive process?

“This topic interested us as we recognized the essential yet time-consuming role of letters of recommendation (LORs) in academic medicine,” explained study author Carl Preiksaitis, a clinical instructor at the Department of Emergency Medicine at Stanford University School of Medicine. “These letters are written for a variety of different scenarios, from application to medical school and residency to faculty promotion. We had heard anecdotal evidence that generative AI models, such as ChatGPT, were being used to aid in authoring LORs and we wanted to explore this possibility in a more rigorous way.”

To conduct the study, the researchers selected four hypothetical candidates for academic promotion. They prepared detailed profiles for these candidates, covering their educational background, employment history, and accolades, but without any gender identification to avoid bias.

Next, the team crafted letters of recommendation. Two experienced members wrote letters as they usually would, serving as the ‘human’ authors. Meanwhile, two junior team members, with no prior experience in such letter writing, used ChatGPT to create AI-authored letters. The AI-generated letters were based on prompts derived from the candidates’ achievements. To maintain consistency, all letters were formatted similarly, focusing solely on content differences.

The researchers then designed a survey, which was administered to 32 participants, primarily full professors in the fields of emergency medicine, internal medicine, and family medicine. These participants were randomly given eight out of 16 letters (half AI-authored, half human-authored) to review. They were asked to guess the authorship of each letter, rate its quality, and assess its persuasiveness regarding the candidate’s promotion.

On average, participants correctly identified the authorship only 59.4% of the time, barely above a random guess. Interestingly, even those with extensive experience in reviewing letters did not fare much better. When it came to the perceived quality and persuasiveness of the letters, there was a bias: reviewers rated letters they believed were human-written higher than those they thought were AI-generated. However, when the actual source of the letters was considered, this difference in perception disappeared.

“One surprising element was the overall difficulty participants had in distinguishing between human- and AI-authored LORs, with accuracy only slightly better than chance,” Preiksaitis said. “Additionally, the study revealed a discrepancy in the perceived quality and persuasiveness of LORs based on the suspected authorship, with human-suspected LORs rated more favorably, despite the actual authorship.”

Google News Preferences Add PsyPost to your preferred sources

The study also examined gender bias in the letters. Results showed human-written letters contained more female-associated words, while AI-generated letters tended to have more male-associated words. Additionally, AI detection tools like GPTZero and OpenAI’s Text Classifier showed mixed effectiveness, each correctly identifying the authorship of the letters only half of the time.

The findings are in line with a previous study published in Research Methods in Applied Linguistics. In that study, 72 linguistics experts were tested to see if they could differentiate between research abstracts written by AI and humans. Despite the experts’ efforts to use linguistic and stylistic analyses, their success rate was only 38.9%, indicating a significant challenge in distinguishing AI writing from human writing.

“The average person should understand that AI technologies like ChatGPT have reached a level of sophistication where they can generate text, such as LORs, that is nearly indistinguishable from human-authored content,” Preiksaitis told PsyPost. “This suggests that AI might be a viable tool to reduce the administrative workload in academic settings. However, it also raises questions about the integrity and personalization of such important documents. The study highlights the potential for AI to assist in academic writing while also signaling the need for careful consideration of its implications.”

Despite these intriguing results, the study is not without its limitations. The standardized format of the data used in letter creation might not reflect the more personalized and nuanced letters in real-world scenarios. Also, the recruitment strategy could lead to biased results, with an overrepresentation of male participants and those in emergency medicine. Moreover, the study did not delve deeply into why and how reviewers made their distinctions between human- and AI-authored letters.

Future research could explore these areas further, perhaps focusing on how to enhance AI’s ability to write more personalized and unbiased letters. Additionally, as AI continues to advance, it’s essential to consider the ethical implications and the need for transparency in its usage, especially in critical areas like academic evaluations.

“A key caveat is the standardized approach used to generate the LORs, which might not reflect the personalized and nuanced understanding a human writer has of the candidate,” Preiksaitis noted. “The overrepresentation of certain demographics in the participant pool and the potential bias in their responses also could limit the generalizability of our findings. Future research should explore how AI-generated LORs might be optimized for authenticity and how biases, both human and AI, can be mitigated. Additionally, the ethical implications of AI assistance in such tasks need thorough exploration.”

“Perhaps most provocatively, this research and the increasing ability of generative AI causes us to question the utility of practices from a pre-AI era, like LOR<‘ the researcher added. “Perhaps we can use this crossroads as an opportunity to develop a different way of recommending candidates that is more equitable and transparent.”

The study, “Brain versus bot: Distinguishing letters of recommendation authored by humans compared with artificial intelligence“, was authored by Carl Preiksaitis, Christopher Nash, Michael Gottlieb, Teresa M. Chan, Al’ai Alvarez, and Adaira Landry.

RELATED

Psilocybin-assisted group therapy may help reduce depression and burnout among healthcare workers
Artificial Intelligence

Mental health chatbots face a cultural divide over emoji use and conversation depth

June 5, 2026
Gold digging is strongly linked to psychopathy and dark personality traits, study finds
Artificial Intelligence

Scientists demonstrate that AI can predict if you are reading a taboo word just by looking at your brain waves

June 3, 2026
Recommendation algorithms might be making your entertainment boring, new research suggests
Artificial Intelligence

Recommendation algorithms might be making your entertainment boring, new research suggests

June 2, 2026
Artificial intelligence flatters users into bad behavior
Artificial Intelligence

AI chatbots fail medical misinformation test, returning inaccurate and fabricated advice

June 1, 2026
Brain scans identify the neural network that traps anxious people in cycles of self-blame
ADHD Research News

Irregular brain maturation in childhood predicts emotional habits in early adolescence

May 31, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

New research reveals how humans judge the moral minds of artificial intelligence

May 30, 2026
Study links phubbing sensitivity to attachment patterns in romantic couples
Artificial Intelligence

Training AI chatbots to be warm and empathetic makes them less factually accurate

May 29, 2026
New Habsburg research reveals reproductive consequences of royal inbreeding
Artificial Intelligence

Machine learning uncovers how childhood trauma amplifies genetic risks for depression

May 27, 2026

Follow PsyPost

The latest research, however you prefer to read it.

Daily newsletter

One email a day. The newest research, nothing else.

Google News

Get PsyPost stories in your Google News feed.

Add PsyPost to Google News
RSS feed

Use your favorite reader. We also syndicate to Apple News.

Copy RSS URL
Social media
Support independent science journalism

Ad-free reading, full archives, and weekly deep dives for members.

Become a member

Trending

  • The location of your body fat is linked to how fast your brain ages
  • Psychopathy and Machiavellianism often look identical, but daily behavior suggests otherwise
  • Not having children isn’t linked to lower happiness, but having more than you wanted is
  • Visual experience physically shapes the brain’s feedback loops
  • Scientists have found a geospatial link between soil fertility and national intelligence scores

Science of Money

  • Economists pull apart the two reasons to raise the minimum wage
  • Can ChatGPT beat the S&P 500? Eight months of daily picks suggest no
  • When inheritances shrink inequality, and when they widen it: A six-country look at the tipping point
  • Why winning makes some gamblers bet bigger: the psychological traits behind the “house money” effect
  • Why people think bankers are greedier than students (and why they may be wrong)

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc