Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

AI math tutor: ChatGPT can be as effective as human help, study suggests

by Eric W. Dolan
February 15, 2025
in Artificial Intelligence
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

A recent study published in PLOS One provides evidence that artificial intelligence can be just as helpful as a human tutor when it comes to learning mathematics. Researchers discovered that students using hints generated by ChatGPT, a popular artificial intelligence chatbot, showed similar learning improvements in algebra and statistics as those receiving guidance from human-authored hints.

Educational technology is increasingly looking towards advanced artificial intelligence tools like ChatGPT to enhance learning experiences. The chatbot’s ability to generate human-like text has sparked interest in its potential for tutoring and providing educational support. Many believe this technology could make personalized learning more accessible and efficient. However, there has been limited research to understand just how effective and reliable these artificial intelligence systems are in actual learning scenarios, particularly in academic subjects like mathematics.

Creating helpful learning materials for online education, such as hints and worked examples, is a time-consuming and expensive process. Traditionally, educators and subject matter experts must manually develop, refine, and check these resources. This often involves many rounds of revisions and quality control. If artificial intelligence like ChatGPT could automatically generate high-quality and effective learning support, it could dramatically reduce the effort and cost involved in developing educational tools. This could pave the way for wider access to tutoring systems and more personalized learning experiences across various subjects and educational levels.

“As a researcher in the space of AI in education, there were a lot of burning questions that the introduction of ChatGPT provoked that were not yet answered,” said study author Zachary A. Pardos, an associate professor at UC Berkeley School of Education.

“While OpenAI provided some report cards on performance, hallucination rates at the granularity level of granular academic subjects were not well established. The essential questions being asked were how often does this technology make mistakes in key STEM areas and can its outputs lead to learning.”

“Also shaping these questions for us was our development of an open source adaptive tutoring system (oatutor.io) and curation of content for that system. We, a research lab, were basically a small publisher and content production was time consuming. From an efficiency and scaling perspective, the role of AI, ChatGPT in particular, to help our team produce materials more quickly without measurable decrease in quality was an important question.”

The researchers conducted an online study involving 274 participants recruited through Amazon Mechanical Turk, a platform for online tasks. All participants had at least a high school degree and had a designation on the platform indicating a history of successful task completion. This ensured they possessed the basic math skills necessary to potentially benefit from the study and that they were reliable online participants.

The study used a carefully designed experiment where participants were randomly assigned to one of three conditions: a control group with no hints, a group receiving hints created by human tutors, and a group receiving hints generated by ChatGPT. Within each of these hint conditions, participants were further randomly assigned to work on problems from one of four mathematics subjects: Elementary Algebra, Intermediate Algebra, College Algebra, or Statistics. The math problems were taken from freely available online textbooks.

Google News Preferences Add PsyPost to your preferred sources

The researchers used an open-source online tutoring system as the platform for the study. This system delivered math problems and, depending on the assigned condition, provided hints. For the human tutor hint condition, the system used pre-existing hints that had been developed by undergraduate students with prior math tutoring experience. These human-created hints were designed to guide students step-by-step through the problem-solving process. For the ChatGPT hint condition, the researchers generated new hints specifically for this study. They prompted ChatGPT with each math problem and used its text-based output as the hint.

Before starting the problem-solving section, all participants completed a short pre-test consisting of three questions to assess their initial knowledge of the assigned math topic. Following the pre-test, participants worked through five practice problems in their assigned subject. In the hint conditions, students could request hints while working on these problems. After the practice problems, participants took a post-test, which used the exact same questions as the pre-test, to measure any learning gains. The control group received correctness feedback during the practice problems but no additional hints. They could, however, request a “bottom-out hint” which simply gave them the answer to the problem so they could move forward. Participants in the hint conditions had access to full worked solution hints in addition to this bottom-out option. The time participants spent on the task was also recorded.

To ensure the quality of the ChatGPT-generated hints, the researchers performed quality checks. They evaluated whether the hints provided the correct answer, showed correct steps, and contained appropriate language. Initially, they found that ChatGPT-generated hints contained errors in about 32% of the problems. To reduce these errors, they used a technique called “self-consistency.” This involved asking ChatGPT to generate ten different hints for each problem and then selecting the hint that contained the most common answer among the ten responses. This method significantly reduced the error rate, particularly for algebra problems, bringing it down to near zero for algebra and to about 13% for statistics problems.

“The high hallucination rate of ChatGPT in the subject areas we tested was surprising and so too was the ability to reduce that to near 0% with a rather simple hallucination mitigation technique,” Pardos told PsyPost.

The researchers found that ChatGPT-generated hints were indeed effective in promoting learning. Participants who received ChatGPT hints showed a statistically significant improvement in their scores from the pre-test to the post-test, indicating they had learned from the hints.

Secondly, the learning gains achieved by students using ChatGPT hints were comparable to those who received human-authored hints. There was no statistically significant difference in learning improvement between these two groups. Both the ChatGPT hint group and the human tutor hint group showed significantly greater learning gains than the control group, which received no hints. Interestingly, while both hint conditions resulted in similar learning, participants in both hint conditions spent more time on the task compared to the control group. However, there was no significant difference in time spent between the ChatGPT hint group and the human tutor hint group.

“ChatGPT used for math educational content production is effective for learning and speeds up the content authoring process by 20-fold,” Pardos said.

But the researchers acknowledged some limitations to their study. One limitation was that, due to the artificial intelligence model’s limitations at the time, they could only use math problems that did not include images or figures. Future research could explore newer versions of these models that can handle visual information. Another point is that the study used Mechanical Turk workers, not students in actual classroom settings. While this allowed for faster data collection and experimentation, future studies should ideally be conducted with students in schools to confirm these findings in real educational environments.

The researchers also pointed out that they used a specific, closed-source artificial intelligence model (ChatGPT 3.5). Future research could investigate the effectiveness of more openly accessible artificial intelligence models. Finally, the study focused on a particular type of learning support – worked example hints. Future studies could explore how artificial intelligence can be used to generate other types of pedagogical strategies and more complex tutoring interactions.

In addition, it remains uncertain whether ChatGPT and other artificial intelligence models can effectively tutor academic subjects beyond mathematics. “This pedagogical approach of tutoring by showing examples of how to solve a problem, generated by AI, may not lend itself to domains that are less procedural in nature (e.g., creative writing),” Pardos noted.

Looking ahead, this study suggests that artificial intelligence has the potential to revolutionize the creation of educational resources and tutoring systems. The fact that ChatGPT can generate math help that is as effective as human-created help, and do so much more quickly, opens exciting possibilities for making high-quality education more accessible and scalable.

“One-on-one human tutoring is very expensive and very effective,” Pardos said. “Incidentally, one-on-one computer tutoring is also expensive to produce. We’re interested in exploring how GenAI-assisted tutor production can change the cost structure and accessibility of tutoring and potentially increase its efficacy through greater personalization that is reasonably achievable with legacy computational approaches.”

“We’ve recently published a study evaluating how well ChatGPT (and other models) can produce questions of appropriate difficulty, compared to textbook questions. Placing teachers in driver’s seat of GenAI is also a research thread we’re making progress on. That emerging research, accepted at Human Factors in Computing Systems conference (CHI), and other threads can be found on our website: https://www.oatutor.io/resources#research-paper.”

The study, “ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skills,” was authored by Zachary A. Pardos and Shreya Bhandari.

Previous Post

ADHD and emotional problems share genetic basis

Next Post

The surprising relationship between vaccinations and Alzheimer’s disease

RELATED

Scientists identify a fat-derived hormone that drives the mood benefits of exercise
Artificial Intelligence

People consistently devalue creative writing generated by artificial intelligence

April 5, 2026
People cannot tell AI-generated from human-written poetry and they like AI poetry more
Artificial Intelligence

Job seekers mask their emotions and act more analytical when evaluated by artificial intelligence

April 3, 2026
AI autocomplete suggestions covertly change how users think about important topics
Artificial Intelligence

AI autocomplete suggestions covertly change how users think about important topics

April 2, 2026
Study links phubbing sensitivity to attachment patterns in romantic couples
Artificial Intelligence

How generative artificial intelligence is upending theories of political persuasion

April 1, 2026
People with attachment anxiety are more vulnerable to problematic AI use
Artificial Intelligence

Relying on AI chatbots for historical facts can influence your political beliefs, new study shows

March 30, 2026
ChatGPT acts as a “cognitive crutch” that weakens memory, new research suggests
Artificial Intelligence

ChatGPT acts as a “cognitive crutch” that weakens memory, new research suggests

March 30, 2026
Russian propaganda campaign used AI to scale output without sacrificing credibility, study finds
Artificial Intelligence

Knowing an AI is involved ruins human trust in social games

March 28, 2026
Scientists just uncovered a major limitation in how AI models understand truth and belief
Artificial Intelligence

Most Americans don’t fear an AI apocalypse, according to new research

March 26, 2026

STAY CONNECTED

RSS Psychology of Selling

  • When brands embrace diversity, some customers pull away — and new research explains why
  • Smaller influencers drive engagement while bigger ones drive purchases, meta-analysis finds
  • Political conservatives are more drawn to baby-faced product designs, and purity values explain why
  • Free gifts with no strings attached can boost customer spending by over 30%, study finds
  • New research reveals the “Goldilocks” age for social media influencers

LATEST

The unexpected link between loneliness, status, and shopping habits

Scientists uncover the neurological mechanisms behind cannabis-induced “munchies”

New psychology research explains why some women devalue their own orgasms

New data shows a relationship between subjective social standing and political activity

Psychedelic retreats linked to mental health improvements in people with severe childhood trauma

Children are less likely to use deception after being given permission to deceive, study finds

Why some neuroscientists now believe we have up to 33 senses

Mathematical model sheds light on the hidden psychology behind authoritarian decision-making

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc