How well can ChatGPT-4 write APA-style psychology papers?

In a recent study published in Contemporary School Psychology, researchers have put the latest AI technology to the test in academic writing, revealing both its potential and limitations.

Artificial Intelligence (AI) has been making waves in various fields, and academia is no exception. Tools powered by AI such as Grammarly and Turnitin have become staples for students and researchers, helping to refine writing by checking for grammar and ensuring originality of written work, respectively. However the capabilities of these tools, particularly in autonomously generating coherent, reliable, and scientifically accurate content, remain under scrutiny.

Led by Adam B Lockwood and Joshua Castleberry from Kent State University, the study aimed to evaluate Generative Pre-trained Transformer 4 (GPT-4), a popular advanced AI language model developed by OpenAI, in writing American Psychological Association (APA)-style psychology papers.

While recent advancements in technology have enabled these sophisticated language models to produce what resembles human-written information, the researchers were curious to assess performance of GPT-4 in three areas: substantiation of claims, factual accuracy, and referencing.

Lockwood and Castleberry entered the following prompt into GPT-4, “Write a 2500-word manuscript on the ethical dilemmas of using ChatGPT to write for psychological and educational reports. Address how APA and NASP guidelines, as well as HIPAA and FERPA laws pertain to these ethical dilemmas. Provide recommendations for overcoming these limitations. Provide citations and references in APA formatting.”

GPT-4 provided a 1814-word document, but after removal of the title, abstract, keywords, headings, and references, a 1043-word paper remained which comprised 45 sentences.

Out of 42 sentences should have been supported by an in-text citation, only 17 (40.5%) were correctly substantiated. The remaining 25 sentences did not have a citation (40%), possessed a citation that did not exist (40%), or were supported by a citation that was irrelevant to the claim being made in the paper (20%).

To check scientific accuracy of the 25 unsubstantiated claims, the researchers were fully able to confirm the accuracy of 14 using other sources, and partially confirm accuracy of 3 more sentences (i.e. the other sources did not explicitly state the claim, but it could be inferred). Thus in total, 31 (73.8%) of sentences were verified.

Google News Preferences Add PsyPost to your preferred sources

Finally, 16 references were provided at the end of the paper – 12 referenced real websites; errors were found on 5 of these (1 listed incorrect authors, 1 failed to provide a Digital Object Identifier (DOI) and 3 provided incorrect links). With the remaining 4 references, 1 was to the wrong article and the 3 remaining links were broken.

Lockwood and Castleberry concluded, “While GPT-4 demonstrated some capability in generating factually accurate information and producing APA-style citations, there were notable limitations. The substantial number of unsubstantiated claims and the presence of errors in citations and referencing indicate the need for further refinement and that we cannot blindly rely on GPT-4 to write papers.”

Some limitations should be noted. The study’s focus on a single paper may not be representative of GPT-4’s overall performance, and the use of specific prompts may have biased GPT-4’s output, suggesting that further research is needed to fully understand its capabilities.

The study, “Examining the Capabilities of GPT-4 to Write an APA-Style School Psychology Paper,” was authored by Adam B. Lockwood and Joshua Castleberry.

How well can ChatGPT-4 write APA-style psychology papers?

New neuroscience research shows the lasting impact of poverty on language processing

Scientists are using VR to study cocaine cravings

RELATED

How generative artificial intelligence is upending theories of political persuasion

Relying on AI chatbots for historical facts can influence your political beliefs, new study shows

ChatGPT acts as a “cognitive crutch” that weakens memory, new research suggests

Knowing an AI is involved ruins human trust in social games

Most Americans don’t fear an AI apocalypse, according to new research

AI can generate images that are just as effective at triggering human emotions as traditional photographs

Efforts to make AI inclusive accidentally create bizarre new gender biases, new research suggests

News chatbots that present multiple viewpoints tend to earn the trust of conspiracy believers

STAY CONNECTED

Psychology of Selling

LATEST

How generative artificial intelligence is upending theories of political persuasion

Scientists use brain measurements to identify a video that significantly lowers racial bias

Brief mindfulness practice accelerates visual processing speeds in adults

Belief in the harmfulness of speech is linked to both progressive ideology and symptoms of depression

Better parent-child communication is linked to stronger soft skills and emotional stability in teens

Men who favor the tradwife lifestyle often view the women in it with derision

A diet based on ultra-processed foods impairs metabolic and reproductive health, study finds

Psychologists identify nine core habits associated with healthy non-monogamous partnerships

Welcome Back!

Retrieve your password

Add New Playlist