Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Early AI models exhibit human-like errors but ChatGPT-4 outperforms humans in cognitive reflection tests

by Eric W. Dolan
May 19, 2024
in Artificial Intelligence
(Photo credit: Adobe Stock)

(Photo credit: Adobe Stock)

Share on TwitterShare on Facebook

Researchers have discovered that OpenAI’s latest generative pre-trained transformer models, commonly known as ChatGPT, can outperform humans in reasoning tasks. Published in Nature Computational Science, the study found that while early versions of these models exhibit intuitive but incorrect responses, similar to humans, ChatGPT-3.5 and ChatGPT-4 demonstrate a significant improvement in accuracy.

The primary aim of the study was to explore whether artificial intelligence models could mimic human cognitive processes, specifically the quick, intuitive decisions known as System 1 thinking, and the slower, more deliberate decisions known as System 2 thinking.

System 1 processes are often prone to errors because they rely on heuristics, or mental shortcuts, whereas System 2 processes involve a more analytical approach, reducing the likelihood of mistakes. By applying psychological methodologies traditionally used to study human reasoning, the researchers hoped to uncover new insights into how these models operate and evolve.

To investigate this, the researchers administered a series of tasks aimed at eliciting intuitive yet erroneous responses to both humans and artificial intelligence systems. These tasks included semantic illusions and various types of cognitive reflection tests. Semantic illusions involve questions that contain misleading information, prompting intuitive but incorrect answers. Cognitive reflection tests require participants to override their initial, intuitive responses to arrive at the correct answer through more deliberate reasoning.

The tasks included problems like:

A potato and a camera together cost $1.40. The potato costs $1 more than the camera. How much does the camera cost? (The correct answer is 20 cents, but an intuitive answer might be 40 cents.)

Where on their bodies do whales have their gills? (The correct answer is that whales do not have gills, but those who fail to reflect on the question often answer “on the sides of their heads.)

The researchers administered these tasks to a range of OpenAI’s generative pre-trained transformer models, spanning from early versions like GPT-1 and GPT-2 to the more advanced ChatGPT-3.5 and ChatGPT-4. Each model was tested under consistent conditions: the ‘temperature’ parameter was set to 0 to minimize response variability, and prompts were prefixed and suffixed with standard phrases to ensure uniformity. The responses of the models were manually reviewed and scored based on accuracy and the reasoning process employed.

Google News Preferences Add PsyPost to your preferred sources

For comparison, the same set of tasks was given to 500 human participants recruited through Prolific.io, a platform for sourcing research participants. These human subjects were presented with a random selection of tasks and a control question to ensure they did not use external aids like language models during the test. Any participants who admitted to using such aids were excluded from the analysis to maintain the integrity of the results.

The researchers observed that as the models evolved from earlier versions like GPT-1 and GPT-2 to the more advanced ChatGPT-3.5 and ChatGPT-4, their performance on tasks designed to provoke intuitive yet incorrect responses improved markedly.

Early versions of the models, such as GPT-1 and GPT-2, displayed a strong tendency toward intuitive, System 1 thinking. These models frequently provided incorrect answers to the cognitive reflection tests and semantic illusions, mirroring the type of rapid, heuristic-based thinking that often leads humans to errors. For example, when presented with a question that intuitively seemed straightforward but required deeper analysis to answer correctly, these models often failed, similar to how many humans would respond.

In contrast, the ChatGPT-3.5 and ChatGPT-4 models demonstrated a significant shift in their problem-solving approach. These more advanced models were capable of employing chain-of-thought reasoning, which involves breaking down problems into smaller, manageable steps and considering each step sequentially.

This type of reasoning is akin to human System 2 thinking, which is more analytical and deliberate. As a result, these models were able to avoid many of the intuitive errors that earlier models and humans commonly made. When instructed to use step-by-step reasoning explicitly, the accuracy of ChatGPT-3.5 and ChatGPT-4 increased dramatically, showcasing their ability to handle complex reasoning tasks more effectively.

Interestingly, the researchers found that even when the ChatGPT models were prevented from engaging in chain-of-thought reasoning, they still outperformed humans and earlier models in terms of accuracy. This indicates that the basic next-word prediction process (System 1-like) of these advanced models has become significantly more reliable.

For instance, when the models were given cognitive reflection tests without additional reasoning prompts, they still provided correct answers more frequently than human participants. This suggests that the intuitions of these advanced models are better calibrated than those of earlier versions and humans.

The findings provide important insights into the ability of artificial intelligence models to engage in complex reasoning processes. However, there is an important caveat to consider. It is possible that some of the models, particularly the more advanced ones like ChatGPT-3.5 and ChatGPT-4, had already encountered examples of cognitive reflection tests during their training. As a result, these models might have been able to solve the tasks ‘from memory’ rather than through genuine reasoning or problem-solving processes.

“The progress in [large language models (LLMs) such as ChatGPT] not only increased their capabilities, but also reduced our ability to anticipate their properties and behavior,” the researchers concluded. “It is increasingly difficult to study LLMs through the lenses of their architecture and hyperparameters. Instead, as we show in this work, LLMs can be studied using methods designed to investigate another capable and opaque structure, namely the human mind. Our approach falls within a quickly growing category of studies employing classic psychological tests and experiments to probe LLM ‘psychological’ processes, such as judgment, decision-making and cognitive biases.”

The study, “Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT,” was authored by Thilo Hagendorff, Sarah Fabi, and Michal Kosinski.

RELATED

Scientists use machine learning to control specific brain circuits
Artificial Intelligence

Scientists use machine learning to control specific brain circuits

February 14, 2026
Younger women find men with beards less attractive than older women do
Artificial Intelligence

Bias against AI art is so deep it changes how viewers perceive color and brightness

February 13, 2026
AI outshines humans in humor: Study finds ChatGPT is as funny as The Onion
Artificial Intelligence

AI boosts worker creativity only if they use specific thinking strategies

February 12, 2026
Psychology study sheds light on the phenomenon of waifus and husbandos
Artificial Intelligence

Psychology study sheds light on the phenomenon of waifus and husbandos

February 11, 2026
How people end romantic relationships: New study pinpoints three common break up strategies
Artificial Intelligence

Psychology shows why using AI for Valentine’s Day could be disastrous

February 9, 2026
Artificial intelligence predicts adolescent mental health risk before symptoms emerge
Artificial Intelligence

Scientists reveal the alien logic of AI: hyper-rational but stumped by simple concepts

February 7, 2026
Stanford scientist discovers that AI has developed an uncanny human-like ability
Artificial Intelligence

The scientist who predicted AI psychosis has issued another dire warning

February 7, 2026
Scientists shocked to find AI’s social desirability bias “exceeds typical human standards”
Artificial Intelligence

Deceptive AI interactions can feel more deep and genuine than actual human conversations

February 5, 2026

STAY CONNECTED

LATEST

Daily soda consumption linked to cognitive difficulties in teens

A specific mental strategy appears to boost relationship problem-solving in a big way

Psychology professor challenges the idea that dating is a marketplace

Scientists use machine learning to control specific brain circuits

One holiday sees a massive spike in emergency contraception sales, and it isn’t Valentine’s Day

Religiosity may protect against depression and stress by fostering gratitude and social support

Virtual parenting games may boost desire for real children, study finds

Donald Trump is fueling a surprising shift in gun culture, new research suggests

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc