PsyPost
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
Join
My Account
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

AI math tutor: ChatGPT can be as effective as human help, study suggests

by Eric W. Dolan
February 15, 2025
Reading Time: 6 mins read
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

A recent study published in PLOS One provides evidence that artificial intelligence can be just as helpful as a human tutor when it comes to learning mathematics. Researchers discovered that students using hints generated by ChatGPT, a popular artificial intelligence chatbot, showed similar learning improvements in algebra and statistics as those receiving guidance from human-authored hints.

Educational technology is increasingly looking towards advanced artificial intelligence tools like ChatGPT to enhance learning experiences. The chatbot’s ability to generate human-like text has sparked interest in its potential for tutoring and providing educational support. Many believe this technology could make personalized learning more accessible and efficient. However, there has been limited research to understand just how effective and reliable these artificial intelligence systems are in actual learning scenarios, particularly in academic subjects like mathematics.

Creating helpful learning materials for online education, such as hints and worked examples, is a time-consuming and expensive process. Traditionally, educators and subject matter experts must manually develop, refine, and check these resources. This often involves many rounds of revisions and quality control. If artificial intelligence like ChatGPT could automatically generate high-quality and effective learning support, it could dramatically reduce the effort and cost involved in developing educational tools. This could pave the way for wider access to tutoring systems and more personalized learning experiences across various subjects and educational levels.

“As a researcher in the space of AI in education, there were a lot of burning questions that the introduction of ChatGPT provoked that were not yet answered,” said study author Zachary A. Pardos, an associate professor at UC Berkeley School of Education.

“While OpenAI provided some report cards on performance, hallucination rates at the granularity level of granular academic subjects were not well established. The essential questions being asked were how often does this technology make mistakes in key STEM areas and can its outputs lead to learning.”

“Also shaping these questions for us was our development of an open source adaptive tutoring system (oatutor.io) and curation of content for that system. We, a research lab, were basically a small publisher and content production was time consuming. From an efficiency and scaling perspective, the role of AI, ChatGPT in particular, to help our team produce materials more quickly without measurable decrease in quality was an important question.”

The researchers conducted an online study involving 274 participants recruited through Amazon Mechanical Turk, a platform for online tasks. All participants had at least a high school degree and had a designation on the platform indicating a history of successful task completion. This ensured they possessed the basic math skills necessary to potentially benefit from the study and that they were reliable online participants.

The study used a carefully designed experiment where participants were randomly assigned to one of three conditions: a control group with no hints, a group receiving hints created by human tutors, and a group receiving hints generated by ChatGPT. Within each of these hint conditions, participants were further randomly assigned to work on problems from one of four mathematics subjects: Elementary Algebra, Intermediate Algebra, College Algebra, or Statistics. The math problems were taken from freely available online textbooks.

Google News Preferences Add PsyPost to your preferred sources

The researchers used an open-source online tutoring system as the platform for the study. This system delivered math problems and, depending on the assigned condition, provided hints. For the human tutor hint condition, the system used pre-existing hints that had been developed by undergraduate students with prior math tutoring experience. These human-created hints were designed to guide students step-by-step through the problem-solving process. For the ChatGPT hint condition, the researchers generated new hints specifically for this study. They prompted ChatGPT with each math problem and used its text-based output as the hint.

Before starting the problem-solving section, all participants completed a short pre-test consisting of three questions to assess their initial knowledge of the assigned math topic. Following the pre-test, participants worked through five practice problems in their assigned subject. In the hint conditions, students could request hints while working on these problems. After the practice problems, participants took a post-test, which used the exact same questions as the pre-test, to measure any learning gains. The control group received correctness feedback during the practice problems but no additional hints. They could, however, request a “bottom-out hint” which simply gave them the answer to the problem so they could move forward. Participants in the hint conditions had access to full worked solution hints in addition to this bottom-out option. The time participants spent on the task was also recorded.

To ensure the quality of the ChatGPT-generated hints, the researchers performed quality checks. They evaluated whether the hints provided the correct answer, showed correct steps, and contained appropriate language. Initially, they found that ChatGPT-generated hints contained errors in about 32% of the problems. To reduce these errors, they used a technique called “self-consistency.” This involved asking ChatGPT to generate ten different hints for each problem and then selecting the hint that contained the most common answer among the ten responses. This method significantly reduced the error rate, particularly for algebra problems, bringing it down to near zero for algebra and to about 13% for statistics problems.

“The high hallucination rate of ChatGPT in the subject areas we tested was surprising and so too was the ability to reduce that to near 0% with a rather simple hallucination mitigation technique,” Pardos told PsyPost.

The researchers found that ChatGPT-generated hints were indeed effective in promoting learning. Participants who received ChatGPT hints showed a statistically significant improvement in their scores from the pre-test to the post-test, indicating they had learned from the hints.

Secondly, the learning gains achieved by students using ChatGPT hints were comparable to those who received human-authored hints. There was no statistically significant difference in learning improvement between these two groups. Both the ChatGPT hint group and the human tutor hint group showed significantly greater learning gains than the control group, which received no hints. Interestingly, while both hint conditions resulted in similar learning, participants in both hint conditions spent more time on the task compared to the control group. However, there was no significant difference in time spent between the ChatGPT hint group and the human tutor hint group.

“ChatGPT used for math educational content production is effective for learning and speeds up the content authoring process by 20-fold,” Pardos said.

But the researchers acknowledged some limitations to their study. One limitation was that, due to the artificial intelligence model’s limitations at the time, they could only use math problems that did not include images or figures. Future research could explore newer versions of these models that can handle visual information. Another point is that the study used Mechanical Turk workers, not students in actual classroom settings. While this allowed for faster data collection and experimentation, future studies should ideally be conducted with students in schools to confirm these findings in real educational environments.

The researchers also pointed out that they used a specific, closed-source artificial intelligence model (ChatGPT 3.5). Future research could investigate the effectiveness of more openly accessible artificial intelligence models. Finally, the study focused on a particular type of learning support – worked example hints. Future studies could explore how artificial intelligence can be used to generate other types of pedagogical strategies and more complex tutoring interactions.

In addition, it remains uncertain whether ChatGPT and other artificial intelligence models can effectively tutor academic subjects beyond mathematics. “This pedagogical approach of tutoring by showing examples of how to solve a problem, generated by AI, may not lend itself to domains that are less procedural in nature (e.g., creative writing),” Pardos noted.

Looking ahead, this study suggests that artificial intelligence has the potential to revolutionize the creation of educational resources and tutoring systems. The fact that ChatGPT can generate math help that is as effective as human-created help, and do so much more quickly, opens exciting possibilities for making high-quality education more accessible and scalable.

“One-on-one human tutoring is very expensive and very effective,” Pardos said. “Incidentally, one-on-one computer tutoring is also expensive to produce. We’re interested in exploring how GenAI-assisted tutor production can change the cost structure and accessibility of tutoring and potentially increase its efficacy through greater personalization that is reasonably achievable with legacy computational approaches.”

“We’ve recently published a study evaluating how well ChatGPT (and other models) can produce questions of appropriate difficulty, compared to textbook questions. Placing teachers in driver’s seat of GenAI is also a research thread we’re making progress on. That emerging research, accepted at Human Factors in Computing Systems conference (CHI), and other threads can be found on our website: https://www.oatutor.io/resources#research-paper.”

The study, “ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skills,” was authored by Zachary A. Pardos and Shreya Bhandari.

RELATED

AI-assisted venting can boost psychological well-being, study suggests
Addiction

Artificial intelligence tools answer addiction questions accurately but lack medical nuance

May 15, 2026
Scientists trained AI to talk people out of conspiracy theories — and it worked surprisingly well
Artificial Intelligence

Real-world evidence shows generative AI is making human creative output more uniform

May 14, 2026
Blue light exposure may counteract anxiety caused by chronic vibration
Addiction

AI-designed drug reduces fentanyl consumption in animal models by targeting serotonin receptors

May 12, 2026
Childhood ADHD traits linked to midlife distress, with societal exclusion playing a major role
Artificial Intelligence

ChatGPT’s free version is 26 times more likely to respond inappropriately to psychotic delusions

May 9, 2026
Mind captioning: This scientist just used AI to translate brain activity into text
Artificial Intelligence

Scientists tested AI’s moral compass, and the results reveal a key blind spot

May 8, 2026
Scientists show how common chord progressions unlock social bonding in the brain
Artificial Intelligence

Perpetrators of AI sexual abuse often view their actions as a joke, new research shows

May 7, 2026
AI outshines humans in humor: Study finds ChatGPT is as funny as The Onion
Artificial Intelligence

Conversational AI shows promise in easing symptoms of anxiety and depression

May 6, 2026
The surprising link between conspiracy mentality and deepfake detection ability
Artificial Intelligence

Deepfake videos degrade political reputations even when viewers realize they are fake

May 5, 2026

Follow PsyPost

The latest research, however you prefer to read it.

Daily newsletter

One email a day. The newest research, nothing else.

Google News

Get PsyPost stories in your Google News feed.

Add PsyPost to Google News
RSS feed

Use your favorite reader. We also syndicate to Apple News.

Copy RSS URL
Social media
Support independent science journalism

Ad-free reading, full archives, and weekly deep dives for members.

Become a member

Trending

  • Liberals hesitate to share progressive causes framed with conservative moral language
  • A simple at-home sexual fantasy exercise increases pleasure and reduces distress
  • Feeling empty after finishing a video game? Researchers say post-game depression is a real phenomenon
  • Intelligence makes people more trusting, but early hardship cuts this benefit in half
  • A classic psychology study on the calming effects of nature just got a massive update

Science of Money

  • When a CEO’s foreign accent becomes an asset: What investors actually hear
  • Congressional stock trades look a lot like retail investing, new study finds
  • Researchers identify a costly pattern in consumer debt repayment
  • Can GPT-4 pick stocks? A new AI framework reports market-beating returns on the S&P 100
  • What 120 studies reveal about financial literacy as a lever for economic inclusion

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc