Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

AI math tutor: ChatGPT can be as effective as human help, study suggests

by Eric W. Dolan
February 15, 2025
in Artificial Intelligence
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

A recent study published in PLOS One provides evidence that artificial intelligence can be just as helpful as a human tutor when it comes to learning mathematics. Researchers discovered that students using hints generated by ChatGPT, a popular artificial intelligence chatbot, showed similar learning improvements in algebra and statistics as those receiving guidance from human-authored hints.

Educational technology is increasingly looking towards advanced artificial intelligence tools like ChatGPT to enhance learning experiences. The chatbot’s ability to generate human-like text has sparked interest in its potential for tutoring and providing educational support. Many believe this technology could make personalized learning more accessible and efficient. However, there has been limited research to understand just how effective and reliable these artificial intelligence systems are in actual learning scenarios, particularly in academic subjects like mathematics.

Creating helpful learning materials for online education, such as hints and worked examples, is a time-consuming and expensive process. Traditionally, educators and subject matter experts must manually develop, refine, and check these resources. This often involves many rounds of revisions and quality control. If artificial intelligence like ChatGPT could automatically generate high-quality and effective learning support, it could dramatically reduce the effort and cost involved in developing educational tools. This could pave the way for wider access to tutoring systems and more personalized learning experiences across various subjects and educational levels.

“As a researcher in the space of AI in education, there were a lot of burning questions that the introduction of ChatGPT provoked that were not yet answered,” said study author Zachary A. Pardos, an associate professor at UC Berkeley School of Education.

“While OpenAI provided some report cards on performance, hallucination rates at the granularity level of granular academic subjects were not well established. The essential questions being asked were how often does this technology make mistakes in key STEM areas and can its outputs lead to learning.”

“Also shaping these questions for us was our development of an open source adaptive tutoring system (oatutor.io) and curation of content for that system. We, a research lab, were basically a small publisher and content production was time consuming. From an efficiency and scaling perspective, the role of AI, ChatGPT in particular, to help our team produce materials more quickly without measurable decrease in quality was an important question.”

The researchers conducted an online study involving 274 participants recruited through Amazon Mechanical Turk, a platform for online tasks. All participants had at least a high school degree and had a designation on the platform indicating a history of successful task completion. This ensured they possessed the basic math skills necessary to potentially benefit from the study and that they were reliable online participants.

The study used a carefully designed experiment where participants were randomly assigned to one of three conditions: a control group with no hints, a group receiving hints created by human tutors, and a group receiving hints generated by ChatGPT. Within each of these hint conditions, participants were further randomly assigned to work on problems from one of four mathematics subjects: Elementary Algebra, Intermediate Algebra, College Algebra, or Statistics. The math problems were taken from freely available online textbooks.

The researchers used an open-source online tutoring system as the platform for the study. This system delivered math problems and, depending on the assigned condition, provided hints. For the human tutor hint condition, the system used pre-existing hints that had been developed by undergraduate students with prior math tutoring experience. These human-created hints were designed to guide students step-by-step through the problem-solving process. For the ChatGPT hint condition, the researchers generated new hints specifically for this study. They prompted ChatGPT with each math problem and used its text-based output as the hint.

Before starting the problem-solving section, all participants completed a short pre-test consisting of three questions to assess their initial knowledge of the assigned math topic. Following the pre-test, participants worked through five practice problems in their assigned subject. In the hint conditions, students could request hints while working on these problems. After the practice problems, participants took a post-test, which used the exact same questions as the pre-test, to measure any learning gains. The control group received correctness feedback during the practice problems but no additional hints. They could, however, request a “bottom-out hint” which simply gave them the answer to the problem so they could move forward. Participants in the hint conditions had access to full worked solution hints in addition to this bottom-out option. The time participants spent on the task was also recorded.

To ensure the quality of the ChatGPT-generated hints, the researchers performed quality checks. They evaluated whether the hints provided the correct answer, showed correct steps, and contained appropriate language. Initially, they found that ChatGPT-generated hints contained errors in about 32% of the problems. To reduce these errors, they used a technique called “self-consistency.” This involved asking ChatGPT to generate ten different hints for each problem and then selecting the hint that contained the most common answer among the ten responses. This method significantly reduced the error rate, particularly for algebra problems, bringing it down to near zero for algebra and to about 13% for statistics problems.

“The high hallucination rate of ChatGPT in the subject areas we tested was surprising and so too was the ability to reduce that to near 0% with a rather simple hallucination mitigation technique,” Pardos told PsyPost.

The researchers found that ChatGPT-generated hints were indeed effective in promoting learning. Participants who received ChatGPT hints showed a statistically significant improvement in their scores from the pre-test to the post-test, indicating they had learned from the hints.

Secondly, the learning gains achieved by students using ChatGPT hints were comparable to those who received human-authored hints. There was no statistically significant difference in learning improvement between these two groups. Both the ChatGPT hint group and the human tutor hint group showed significantly greater learning gains than the control group, which received no hints. Interestingly, while both hint conditions resulted in similar learning, participants in both hint conditions spent more time on the task compared to the control group. However, there was no significant difference in time spent between the ChatGPT hint group and the human tutor hint group.

“ChatGPT used for math educational content production is effective for learning and speeds up the content authoring process by 20-fold,” Pardos said.

But the researchers acknowledged some limitations to their study. One limitation was that, due to the artificial intelligence model’s limitations at the time, they could only use math problems that did not include images or figures. Future research could explore newer versions of these models that can handle visual information. Another point is that the study used Mechanical Turk workers, not students in actual classroom settings. While this allowed for faster data collection and experimentation, future studies should ideally be conducted with students in schools to confirm these findings in real educational environments.

The researchers also pointed out that they used a specific, closed-source artificial intelligence model (ChatGPT 3.5). Future research could investigate the effectiveness of more openly accessible artificial intelligence models. Finally, the study focused on a particular type of learning support – worked example hints. Future studies could explore how artificial intelligence can be used to generate other types of pedagogical strategies and more complex tutoring interactions.

In addition, it remains uncertain whether ChatGPT and other artificial intelligence models can effectively tutor academic subjects beyond mathematics. “This pedagogical approach of tutoring by showing examples of how to solve a problem, generated by AI, may not lend itself to domains that are less procedural in nature (e.g., creative writing),” Pardos noted.

Looking ahead, this study suggests that artificial intelligence has the potential to revolutionize the creation of educational resources and tutoring systems. The fact that ChatGPT can generate math help that is as effective as human-created help, and do so much more quickly, opens exciting possibilities for making high-quality education more accessible and scalable.

“One-on-one human tutoring is very expensive and very effective,” Pardos said. “Incidentally, one-on-one computer tutoring is also expensive to produce. We’re interested in exploring how GenAI-assisted tutor production can change the cost structure and accessibility of tutoring and potentially increase its efficacy through greater personalization that is reasonably achievable with legacy computational approaches.”

“We’ve recently published a study evaluating how well ChatGPT (and other models) can produce questions of appropriate difficulty, compared to textbook questions. Placing teachers in driver’s seat of GenAI is also a research thread we’re making progress on. That emerging research, accepted at Human Factors in Computing Systems conference (CHI), and other threads can be found on our website: https://www.oatutor.io/resources#research-paper.”

The study, “ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skills,” was authored by Zachary A. Pardos and Shreya Bhandari.

RELATED

AI outshines humans in humor: Study finds ChatGPT is as funny as The Onion
Artificial Intelligence

Most top US research universities now encourage generative AI use in the classroom

December 14, 2025
Media coverage of artificial intelligence split along political lines, study finds
Artificial Intelligence

Survey reveals rapid adoption of AI tools in mental health care despite safety concerns

December 13, 2025
Harrowing case report details a psychotic “resurrection” delusion fueled by a sycophantic AI
Artificial Intelligence

Harrowing case report details a psychotic “resurrection” delusion fueled by a sycophantic AI

December 13, 2025
Scientists just uncovered a major limitation in how AI models understand truth and belief
Artificial Intelligence

Scientists just uncovered a major limitation in how AI models understand truth and belief

December 11, 2025
Russian propaganda campaign used AI to scale output without sacrificing credibility, study finds
Artificial Intelligence

AI can change political opinions by flooding voters with real and fabricated facts

December 9, 2025
How common is anal sex? Scientific facts about prevalence, pain, pleasure, and more
Artificial Intelligence

Humans and AI both rate deliberate thinkers as smarter than intuitive ones

December 5, 2025
Song lyrics have become simpler, more negative, and more self-focused over time
Artificial Intelligence

An “AI” label fails to trigger negative bias in new pop music study

November 30, 2025
Daughters who feel more attractive report stronger, more protective bonds with their fathers
Artificial Intelligence

Learning via ChatGPT leads to shallower knowledge than using Google search, study finds

November 30, 2025

PsyPost Merch

STAY CONNECTED

LATEST

Authoritarian leadership linked to higher innovation in family-owned companies

Sexual difficulties in eating disorders may stem from different causes in men and women

Analysis of 20 million posts reveals how basic psychological needs drive activity in extremist chatrooms

Most top US research universities now encourage generative AI use in the classroom

New study suggests “Zoom fatigue” is largely gone in the post-pandemic workplace

Women are more inclined to maintain high-conflict relationships if their partner displays benevolent sexism

Dim morning light triggers biological markers of depression in healthy adults

Amphetamine overrides brain signals associated with sexual rejection

RSS Psychology of Selling

  • Mental reconnection in the morning fuels workplace proactivity
  • The challenge of selling the connected home
  • Consumers prefer emotionally intelligent AI, but not for guilty pleasures
  • Active listening improves likability but does not enhance persuasion
  • New study maps the psychology behind the post-holiday return surge
         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy