Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

AI math tutor: ChatGPT can be as effective as human help, study suggests

by Eric W. Dolan
February 15, 2025
in Artificial Intelligence
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook
Stay on top of the latest psychology findings: Subscribe now!

A recent study published in PLOS One provides evidence that artificial intelligence can be just as helpful as a human tutor when it comes to learning mathematics. Researchers discovered that students using hints generated by ChatGPT, a popular artificial intelligence chatbot, showed similar learning improvements in algebra and statistics as those receiving guidance from human-authored hints.

Educational technology is increasingly looking towards advanced artificial intelligence tools like ChatGPT to enhance learning experiences. The chatbot’s ability to generate human-like text has sparked interest in its potential for tutoring and providing educational support. Many believe this technology could make personalized learning more accessible and efficient. However, there has been limited research to understand just how effective and reliable these artificial intelligence systems are in actual learning scenarios, particularly in academic subjects like mathematics.

Creating helpful learning materials for online education, such as hints and worked examples, is a time-consuming and expensive process. Traditionally, educators and subject matter experts must manually develop, refine, and check these resources. This often involves many rounds of revisions and quality control. If artificial intelligence like ChatGPT could automatically generate high-quality and effective learning support, it could dramatically reduce the effort and cost involved in developing educational tools. This could pave the way for wider access to tutoring systems and more personalized learning experiences across various subjects and educational levels.

“As a researcher in the space of AI in education, there were a lot of burning questions that the introduction of ChatGPT provoked that were not yet answered,” said study author Zachary A. Pardos, an associate professor at UC Berkeley School of Education.

“While OpenAI provided some report cards on performance, hallucination rates at the granularity level of granular academic subjects were not well established. The essential questions being asked were how often does this technology make mistakes in key STEM areas and can its outputs lead to learning.”

“Also shaping these questions for us was our development of an open source adaptive tutoring system (oatutor.io) and curation of content for that system. We, a research lab, were basically a small publisher and content production was time consuming. From an efficiency and scaling perspective, the role of AI, ChatGPT in particular, to help our team produce materials more quickly without measurable decrease in quality was an important question.”

The researchers conducted an online study involving 274 participants recruited through Amazon Mechanical Turk, a platform for online tasks. All participants had at least a high school degree and had a designation on the platform indicating a history of successful task completion. This ensured they possessed the basic math skills necessary to potentially benefit from the study and that they were reliable online participants.

The study used a carefully designed experiment where participants were randomly assigned to one of three conditions: a control group with no hints, a group receiving hints created by human tutors, and a group receiving hints generated by ChatGPT. Within each of these hint conditions, participants were further randomly assigned to work on problems from one of four mathematics subjects: Elementary Algebra, Intermediate Algebra, College Algebra, or Statistics. The math problems were taken from freely available online textbooks.

The researchers used an open-source online tutoring system as the platform for the study. This system delivered math problems and, depending on the assigned condition, provided hints. For the human tutor hint condition, the system used pre-existing hints that had been developed by undergraduate students with prior math tutoring experience. These human-created hints were designed to guide students step-by-step through the problem-solving process. For the ChatGPT hint condition, the researchers generated new hints specifically for this study. They prompted ChatGPT with each math problem and used its text-based output as the hint.

Before starting the problem-solving section, all participants completed a short pre-test consisting of three questions to assess their initial knowledge of the assigned math topic. Following the pre-test, participants worked through five practice problems in their assigned subject. In the hint conditions, students could request hints while working on these problems. After the practice problems, participants took a post-test, which used the exact same questions as the pre-test, to measure any learning gains. The control group received correctness feedback during the practice problems but no additional hints. They could, however, request a “bottom-out hint” which simply gave them the answer to the problem so they could move forward. Participants in the hint conditions had access to full worked solution hints in addition to this bottom-out option. The time participants spent on the task was also recorded.

To ensure the quality of the ChatGPT-generated hints, the researchers performed quality checks. They evaluated whether the hints provided the correct answer, showed correct steps, and contained appropriate language. Initially, they found that ChatGPT-generated hints contained errors in about 32% of the problems. To reduce these errors, they used a technique called “self-consistency.” This involved asking ChatGPT to generate ten different hints for each problem and then selecting the hint that contained the most common answer among the ten responses. This method significantly reduced the error rate, particularly for algebra problems, bringing it down to near zero for algebra and to about 13% for statistics problems.

“The high hallucination rate of ChatGPT in the subject areas we tested was surprising and so too was the ability to reduce that to near 0% with a rather simple hallucination mitigation technique,” Pardos told PsyPost.

The researchers found that ChatGPT-generated hints were indeed effective in promoting learning. Participants who received ChatGPT hints showed a statistically significant improvement in their scores from the pre-test to the post-test, indicating they had learned from the hints.

Secondly, the learning gains achieved by students using ChatGPT hints were comparable to those who received human-authored hints. There was no statistically significant difference in learning improvement between these two groups. Both the ChatGPT hint group and the human tutor hint group showed significantly greater learning gains than the control group, which received no hints. Interestingly, while both hint conditions resulted in similar learning, participants in both hint conditions spent more time on the task compared to the control group. However, there was no significant difference in time spent between the ChatGPT hint group and the human tutor hint group.

“ChatGPT used for math educational content production is effective for learning and speeds up the content authoring process by 20-fold,” Pardos said.

But the researchers acknowledged some limitations to their study. One limitation was that, due to the artificial intelligence model’s limitations at the time, they could only use math problems that did not include images or figures. Future research could explore newer versions of these models that can handle visual information. Another point is that the study used Mechanical Turk workers, not students in actual classroom settings. While this allowed for faster data collection and experimentation, future studies should ideally be conducted with students in schools to confirm these findings in real educational environments.

The researchers also pointed out that they used a specific, closed-source artificial intelligence model (ChatGPT 3.5). Future research could investigate the effectiveness of more openly accessible artificial intelligence models. Finally, the study focused on a particular type of learning support – worked example hints. Future studies could explore how artificial intelligence can be used to generate other types of pedagogical strategies and more complex tutoring interactions.

In addition, it remains uncertain whether ChatGPT and other artificial intelligence models can effectively tutor academic subjects beyond mathematics. “This pedagogical approach of tutoring by showing examples of how to solve a problem, generated by AI, may not lend itself to domains that are less procedural in nature (e.g., creative writing),” Pardos noted.

Looking ahead, this study suggests that artificial intelligence has the potential to revolutionize the creation of educational resources and tutoring systems. The fact that ChatGPT can generate math help that is as effective as human-created help, and do so much more quickly, opens exciting possibilities for making high-quality education more accessible and scalable.

“One-on-one human tutoring is very expensive and very effective,” Pardos said. “Incidentally, one-on-one computer tutoring is also expensive to produce. We’re interested in exploring how GenAI-assisted tutor production can change the cost structure and accessibility of tutoring and potentially increase its efficacy through greater personalization that is reasonably achievable with legacy computational approaches.”

“We’ve recently published a study evaluating how well ChatGPT (and other models) can produce questions of appropriate difficulty, compared to textbook questions. Placing teachers in driver’s seat of GenAI is also a research thread we’re making progress on. That emerging research, accepted at Human Factors in Computing Systems conference (CHI), and other threads can be found on our website: https://www.oatutor.io/resources#research-paper.”

The study, “ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skills,” was authored by Zachary A. Pardos and Shreya Bhandari.

TweetSendScanShareSendPinShareShareShareShareShare

RELATED

Dark personality traits and specific humor styles are linked to online trolling, study finds
Artificial Intelligence

Memes can serve as strong indicators of coming mass violence

June 15, 2025

A new study finds that surges in visual propaganda—like memes and doctored images—often precede political violence. By combining AI with expert analysis, researchers tracked manipulated content leading up to Russia’s invasion of Ukraine, revealing early warning signs of instability.

Read moreDetails
Teen depression tied to balance of adaptive and maladaptive emotional strategies, study finds
Artificial Intelligence

Sleep problems top list of predictors for teen mental illness, AI-powered study finds

June 15, 2025

A new study using data from over 11,000 adolescents found that sleep disturbances were the most powerful predictor of future mental health problems—more so than trauma or family history. AI models based on questionnaires outperformed those using brain scans.

Read moreDetails
New research links certain types of narcissism to anti-immigrant attitudes
Artificial Intelligence

Fears about AI push workers to embrace creativity over coding, new research suggests

June 13, 2025

A new study shows that when workers feel threatened by artificial intelligence, they tend to highlight creativity—rather than technical or social skills—in job applications and education choices. The research suggests people see creativity as a uniquely human skill machines can’t replace.

Read moreDetails
Smash or pass? AI could soon predict your date’s interest via physiological cues
Artificial Intelligence

A neuroscientist explains why it’s impossible for AI to “understand” language

June 12, 2025

Can artificial intelligence truly “understand” language the way humans do? A neuroscientist challenges this popular belief, arguing that machines may generate convincing text—but they lack the emotional, contextual, and biological grounding that gives real meaning to human communication.

Read moreDetails
Scientists reveal ChatGPT’s left-wing bias — and how to “jailbreak” it
Artificial Intelligence

ChatGPT mimics human cognitive dissonance in psychological experiments, study finds

June 3, 2025

OpenAI’s GPT-4o demonstrated behavior resembling cognitive dissonance in a psychological experiment. After writing essays about Vladimir Putin, the AI changed its evaluations—especially when it thought it had freely chosen which argument to make, echoing patterns seen in people.

Read moreDetails
Generative AI simplifies science communication, boosts public trust in scientists
Artificial Intelligence

East Asians more open to chatbot companionship than Westerners

May 30, 2025

A new study highlights cultural differences in attitudes toward AI companionship. East Asian participants were more open to emotionally connecting with chatbots, a pattern linked to greater anthropomorphism and differing exposure to social robots across regions.

Read moreDetails
AI can predict intimate partner femicide from variables extracted from legal documents
Artificial Intelligence

Being honest about using AI can backfire on your credibility

May 29, 2025

New research reveals a surprising downside to AI transparency: people who admit to using AI at work are seen as less trustworthy. Across 13 experiments, disclosing AI use consistently reduced credibility—even among tech-savvy evaluators and in professional contexts.

Read moreDetails
Too much ChatGPT? Study ties AI reliance to lower grades and motivation
Artificial Intelligence

Too much ChatGPT? Study ties AI reliance to lower grades and motivation

May 27, 2025

A new study suggests that conscientious students are less likely to use generative AI tools like ChatGPT and that this may work in their favor. Frequent AI users reported lower grades, weaker academic confidence, and greater feelings of helplessness.

Read moreDetails

SUBSCRIBE

Go Ad-Free! Click here to subscribe to PsyPost and support independent science journalism!

STAY CONNECTED

LATEST

Different parts of the same neuron learn in different ways, study finds

Conspiracy believers tend to overrate their cognitive abilities and think most others agree with them

Memes can serve as strong indicators of coming mass violence

9 psychology studies that reveal the powerful role of fathers in shaping lives

This self-talk exercise may help reduce emotional dysregulation in autistic children

Sleep problems top list of predictors for teen mental illness, AI-powered study finds

Scientists uncover surprisingly consistent pattern of scholarly curiosity throughout history

Single-dose psilocybin therapy shows promise for reducing alcohol consumption

         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy