Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Users of generative AI struggle to accurately assess their own competence

by Eric W. Dolan
December 29, 2025
in Artificial Intelligence, Cognitive Science
Share on TwitterShare on Facebook

New research provides evidence that using artificial intelligence to complete tasks can improve a person’s performance while simultaneously distorting their ability to assess that performance accurately. The findings indicate that while users of AI tools like ChatGPT achieve higher scores on logical reasoning tests compared to those working alone, they consistently overestimate their success by a significant margin.

This pattern suggests that AI assistance may disconnect a user’s perceived competence from their actual results, leading to a state of inflated confidence. The study was published in the scientific journal Computers in Human Behavior.

Scientists and psychologists have increasingly focused on how human cognition changes when augmented by technology. As generative AI systems become common in professional and educational settings, it is essential to understand how these tools influence metacognition. Metacognition refers to the ability of an individual to monitor and regulate their own thinking processes. It allows people to know when they are likely correct and when they might be making an error.

Previous psychological inquiries have established that humans generally struggle with self-assessment. A well-known phenomenon called the Dunning-Kruger effect describes how individuals with lower skills tend to overestimate their competence, while highly skilled individuals often underestimate their abilities. The authors of the current paper sought to determine if this pattern persists when humans collaborate with AI. They aimed to understand if AI acts as an equalizer that fixes these biases or if it introduces new complications to how people evaluate their work.

To investigate these questions, the research team designed two distinct studies centered on logical reasoning tasks. In the first study, they recruited 246 participants from the United States. These individuals were asked to complete 20 logical reasoning problems taken from the Law School Admission Test (LSAT). The researchers provided participants with a specialized web interface. This interface displayed the questions on one side and a ChatGPT interaction window on the other.

Participants were required to interact with the AI at least once for each question. They could ask the AI to solve the problem or explain the logic. After submitting their answers, participants estimated how many of the 20 questions they believed they had answered correctly. They also rated their confidence on a specific scale for each individual decision.

The results of this first study showed a clear improvement in objective performance. On average, participants using ChatGPT scored approximately three points higher than a historical control group of people who took the same test without AI assistance. The AI helped users solve problems that they likely would have missed on their own.

Despite this improvement in scores, the participants engaged in significant overestimation. On average, the group estimated they had answered about 17 out of 20 questions correctly. In reality, their average score was closer to 13. This represents a four-point gap between perception and reality. The data suggests that the seamless assistance provided by the AI created an illusion of competence.

The study also analyzed the relationship between a participant’s knowledge of AI and their self-assessment. The researchers measured “AI literacy” using a tool called the Scale for the Assessment of Non-Experts’ AI Literacy. One might expect that understanding how AI works would make a user more skeptical or accurate in their judgment. The findings indicated the opposite. Participants with higher technical understanding of AI tended to be more confident in their answers but less accurate in judging their actual performance.

A significant theoretical contribution of this research involves the Dunning-Kruger effect. In typical scenarios without AI, the data would show a steep slope where low performers vastly overestimate themselves and high performers do not. When participants used AI, this effect vanished. The “leveling” effect of the technology meant that overestimation became uniform across the board. Low performers and high performers alike inflated their scores by similar amounts.

The researchers observed that the combined performance of the human and the AI did not exceed the performance of the AI alone. The AI system, when running the test by itself, achieved a higher average score than the humans using the AI. This suggests a failure of synergy. Humans occasionally accepted incorrect advice from the AI or overrode correct advice, dragging the overall performance down below the machine’s maximum potential.

To ensure these findings were robust, the researchers conducted a second study. This replication involved 452 participants. The researchers split this sample into two distinct groups. One group performed the task with AI assistance, while the other group worked without any technological aid.

In this second experiment, the researchers introduced a monetary incentive to encourage accuracy. Participants were told they would receive a financial bonus if their estimate of their score matched their actual score. The goal was to rule out the possibility that participants were simply not trying hard enough to be self-aware.

The results of the second study mirrored the first. The monetary incentive did not correct the overestimation bias. The group using AI continued to perform better than the unaided group but persisted in overestimating their scores. The unaided group showed the classic Dunning-Kruger pattern, where the least skilled participants showed the most bias. The AI group again showed a uniform bias, confirming that the technology fundamentally shifts how users perceive their competence.

The study also utilized a measurement called the “Area Under the Curve” or AUC to judge metacognitive sensitivity. This metric determines if a person is more confident when they are right than when they are wrong. Ideally, a person should feel unsure when they make a mistake. The data showed that participants had low metacognitive sensitivity. Their confidence levels were high regardless of whether they were right or wrong on a specific question.

Qualitative data collected from chat logs offered additional context. The researchers noted that most participants acted as passive recipients of information. They frequently copied and pasted questions into the chat and accepted the AI’s output without significant challenge or verification. Only a small fraction of users treated the AI as a collaborative partner or a tool for double-checking their own logic.

The researchers discussed several potential reasons for these outcomes. One possibility is the “illusion of explanatory depth.” When an AI provides a fluent, articulate, and instant explanation, it can trick the brain into thinking the information is processed and understood more deeply than it actually is. The ease of obtaining the answer reduces the cognitive struggle usually required to solve logic puzzles, which in turn dulls the internal signals that warn a person they might be wrong.

As with all research, there are caveats to consider. The first study used a historical comparison group rather than a simultaneous control group, though the second study corrected this. Additionally, the task was limited to LSAT logical reasoning questions. It is possible that different types of tasks, such as creative writing or coding, might yield different metacognitive patterns.

The study also relied on a specific version of ChatGPT. As these models evolve and become more accurate, the dynamic between human and machine could shift. The researchers also noted that the participants were required to use the AI, which might differ from a real-world scenario where a user chooses when to consult the tool.

Future research directions were suggested to address these gaps. The researchers recommend investigating design changes that could force users to engage more critically. For example, an interface might require a user to explain the AI’s logic back to the system before accepting an answer. Long-term studies are also needed to see if this overconfidence fades as users become more experienced with the limitations of large language models.

The study, “AI makes you smarter but none the wiser: The disconnect between performance and metacognition,” was authored by Daniela Fernandes, Steeven Villa, Salla Nicholls, Otso Haavisto, Daniel Buschek, Albrecht Schmidt, Thomas Kosch, Chenxinran Shen, and Robin Welsch.

RELATED

Scientists uncover previously unknown target of alcohol in the brain: the TMEM132B-GABAA receptor complex
Cognitive Science

Neuroscience study reveals that familiar rewards trigger motor preparation before a decision is made

January 20, 2026
AI chatbots often misrepresent scientific studies — and newer models may be worse
Artificial Intelligence

Sycophantic chatbots inflate people’s perceptions that they are “better than average”

January 19, 2026
Trump supporters and insecure men more likely to value a large penis, according to new research
Cognitive Science

Negative facial expressions interfere with the perception of cause and effect

January 18, 2026
Google searches for racial slurs are higher in areas where people are worried about disease
Artificial Intelligence

Learning from AI summaries leads to shallower knowledge than web search

January 17, 2026
Scientists link dyslexia risk genes to brain differences in motor, visual, and language areas
Cognitive Science

Elite army training reveals genetic markers for resilience

January 17, 2026
Neuroscientists find evidence meditation changes how fluid moves in the brain
Artificial Intelligence

Scientists show humans can “catch” fear from a breathing robot

January 16, 2026
Spacing math practice across multiple sessions improves students’ test scores and helps them accurately judge their learning
Cognitive Science

Boys and girls tend to use different strategies to solve math problems, new research shows

January 15, 2026
New research highlights the emotional and cognitive benefits of classical music ensembles for youth
Cognitive Science

Music training may buffer children against the academic toll of poverty

January 14, 2026

PsyPost Merch

STAY CONNECTED

LATEST

Maladaptive personality traits are linked to poor sleep quality in new twin study

Depression’s impact on fairness perceptions depends on socioeconomic status

Early life adversity primes the body for persistent physical pain, new research suggests

Economic uncertainty linked to greater male aversion to female breadwinning

Women tend to downplay their gender in workplaces with masculinity contest cultures

Young people show posttraumatic growth after losing a parent, finding strength, meaning, and appreciation for life

MDMA-assisted therapy shows promise for long-term depression relief

Neuroscience study reveals that familiar rewards trigger motor preparation before a decision is made

RSS Psychology of Selling

  • How defending your opinion changes your confidence
  • The science behind why accessibility drives revenue in the fashion sector
  • How AI and political ideology intersect in the market for sensitive products
  • Researchers track how online shopping is related to stress
  • New study reveals why some powerful leaders admit mistakes while others double down
         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy