Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Early AI models exhibit human-like errors but ChatGPT-4 outperforms humans in cognitive reflection tests

by Eric W. Dolan
May 19, 2024
in Artificial Intelligence
(Photo credit: Adobe Stock)

(Photo credit: Adobe Stock)

Share on TwitterShare on Facebook

Researchers have discovered that OpenAI’s latest generative pre-trained transformer models, commonly known as ChatGPT, can outperform humans in reasoning tasks. Published in Nature Computational Science, the study found that while early versions of these models exhibit intuitive but incorrect responses, similar to humans, ChatGPT-3.5 and ChatGPT-4 demonstrate a significant improvement in accuracy.

The primary aim of the study was to explore whether artificial intelligence models could mimic human cognitive processes, specifically the quick, intuitive decisions known as System 1 thinking, and the slower, more deliberate decisions known as System 2 thinking.

System 1 processes are often prone to errors because they rely on heuristics, or mental shortcuts, whereas System 2 processes involve a more analytical approach, reducing the likelihood of mistakes. By applying psychological methodologies traditionally used to study human reasoning, the researchers hoped to uncover new insights into how these models operate and evolve.

To investigate this, the researchers administered a series of tasks aimed at eliciting intuitive yet erroneous responses to both humans and artificial intelligence systems. These tasks included semantic illusions and various types of cognitive reflection tests. Semantic illusions involve questions that contain misleading information, prompting intuitive but incorrect answers. Cognitive reflection tests require participants to override their initial, intuitive responses to arrive at the correct answer through more deliberate reasoning.

The tasks included problems like:

A potato and a camera together cost $1.40. The potato costs $1 more than the camera. How much does the camera cost? (The correct answer is 20 cents, but an intuitive answer might be 40 cents.)

Where on their bodies do whales have their gills? (The correct answer is that whales do not have gills, but those who fail to reflect on the question often answer “on the sides of their heads.)

The researchers administered these tasks to a range of OpenAI’s generative pre-trained transformer models, spanning from early versions like GPT-1 and GPT-2 to the more advanced ChatGPT-3.5 and ChatGPT-4. Each model was tested under consistent conditions: the ‘temperature’ parameter was set to 0 to minimize response variability, and prompts were prefixed and suffixed with standard phrases to ensure uniformity. The responses of the models were manually reviewed and scored based on accuracy and the reasoning process employed.

Google News Preferences Add PsyPost to your preferred sources

For comparison, the same set of tasks was given to 500 human participants recruited through Prolific.io, a platform for sourcing research participants. These human subjects were presented with a random selection of tasks and a control question to ensure they did not use external aids like language models during the test. Any participants who admitted to using such aids were excluded from the analysis to maintain the integrity of the results.

The researchers observed that as the models evolved from earlier versions like GPT-1 and GPT-2 to the more advanced ChatGPT-3.5 and ChatGPT-4, their performance on tasks designed to provoke intuitive yet incorrect responses improved markedly.

Early versions of the models, such as GPT-1 and GPT-2, displayed a strong tendency toward intuitive, System 1 thinking. These models frequently provided incorrect answers to the cognitive reflection tests and semantic illusions, mirroring the type of rapid, heuristic-based thinking that often leads humans to errors. For example, when presented with a question that intuitively seemed straightforward but required deeper analysis to answer correctly, these models often failed, similar to how many humans would respond.

In contrast, the ChatGPT-3.5 and ChatGPT-4 models demonstrated a significant shift in their problem-solving approach. These more advanced models were capable of employing chain-of-thought reasoning, which involves breaking down problems into smaller, manageable steps and considering each step sequentially.

This type of reasoning is akin to human System 2 thinking, which is more analytical and deliberate. As a result, these models were able to avoid many of the intuitive errors that earlier models and humans commonly made. When instructed to use step-by-step reasoning explicitly, the accuracy of ChatGPT-3.5 and ChatGPT-4 increased dramatically, showcasing their ability to handle complex reasoning tasks more effectively.

Interestingly, the researchers found that even when the ChatGPT models were prevented from engaging in chain-of-thought reasoning, they still outperformed humans and earlier models in terms of accuracy. This indicates that the basic next-word prediction process (System 1-like) of these advanced models has become significantly more reliable.

For instance, when the models were given cognitive reflection tests without additional reasoning prompts, they still provided correct answers more frequently than human participants. This suggests that the intuitions of these advanced models are better calibrated than those of earlier versions and humans.

The findings provide important insights into the ability of artificial intelligence models to engage in complex reasoning processes. However, there is an important caveat to consider. It is possible that some of the models, particularly the more advanced ones like ChatGPT-3.5 and ChatGPT-4, had already encountered examples of cognitive reflection tests during their training. As a result, these models might have been able to solve the tasks ‘from memory’ rather than through genuine reasoning or problem-solving processes.

“The progress in [large language models (LLMs) such as ChatGPT] not only increased their capabilities, but also reduced our ability to anticipate their properties and behavior,” the researchers concluded. “It is increasingly difficult to study LLMs through the lenses of their architecture and hyperparameters. Instead, as we show in this work, LLMs can be studied using methods designed to investigate another capable and opaque structure, namely the human mind. Our approach falls within a quickly growing category of studies employing classic psychological tests and experiments to probe LLM ‘psychological’ processes, such as judgment, decision-making and cognitive biases.”

The study, “Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT,” was authored by Thilo Hagendorff, Sarah Fabi, and Michal Kosinski.

Previous Post

Why is a messy house such an anxiety trigger and what can you do about it?

Next Post

Conspiracy believers exhibit reduced emotional granularity and heightened rumination

RELATED

Researchers identify two psychological traits that predict conspiracy theory belief
Artificial Intelligence

Brain-controlled assistive robots work best when they share the workload with users

March 8, 2026
Why most people fail to spot AI-generated faces, while super-recognizers have a subtle advantage
Artificial Intelligence

Why most people fail to spot AI-generated faces, while super-recognizers have a subtle advantage

February 28, 2026
People with social anxiety more likely to become overdependent on conversational artificial intelligence agents
Artificial Intelligence

AI therapy is rated higher for empathy until people learn a machine wrote the text

February 26, 2026
New research: AI models tend to reflect the political ideologies of their creators
Artificial Intelligence

New research: AI models tend to reflect the political ideologies of their creators

February 26, 2026
Stress disrupts gut and brain barriers by reducing key microbial metabolites, study finds
Artificial Intelligence

AI and mental health: New research links use of ChatGPT to worsened psychiatric symptoms

February 24, 2026
Stanford scientist discovers that AI has developed an uncanny human-like ability
Artificial Intelligence

How personality and culture relate to our perceptions of artificial intelligence

February 23, 2026
Young children are more likely to trust information from robots over humans
Artificial Intelligence

The presence of robot eyes affects perception of mind

February 21, 2026
Psychology study reveals a fascinating fact about artwork
Artificial Intelligence

AI art fails to trigger the same empathy as human works

February 20, 2026

STAY CONNECTED

LATEST

Misophonia is strongly linked to a higher risk of mental health and auditory disorders

Brain scans reveal the unique brain structures linked to frequent lucid dreaming

Black Lives Matter protests sparked a short-term conservative backlash but ultimately shifted the 2020 election towards Democrats

Massive global study links the habit of forgiving others to better overall well-being

Neuroscientists have pinpointed a potential biological signature for psychopathy

Supportive relationships are linked to positive personality changes

Brain-controlled assistive robots work best when they share the workload with users

Common airborne chemicals are linked to suicidal thoughts in a new public health study

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc