A new study has found that university students in a programming course who used artificial intelligence chatbots more frequently tended to have lower academic scores. The research, published in Computers in Human Behavior Reports, offers a detailed look into how students engage with these tools and suggests a complex relationship between their use and learning outcomes.
An AI chatbot is a computer program designed to simulate human conversation through text. With the public release of advanced models like ChatGPT in 2022, these tools have become widely accessible and have significantly influenced many fields, including education. In computer science, chatbots that can generate and explain code present both new learning opportunities and potential pitfalls for students.
Despite their growing use, there is little agreement on how students are actually incorporating these tools into their studies or how that usage affects their performance. Researchers conducted this study to fill that gap, aiming to understand how students in a foundational “Object-Oriented Programming” course perceive and use chatbots and to investigate the connection between that use and their grades.
The investigation was centered on 231 students enrolled in an “Object-Oriented Programming” course at the University of Tartu. This course is a mandatory part of the first-year computer science curriculum. During the eighth week of the semester, students were invited to complete a detailed survey about their experiences in the course, which included a section dedicated to their use of AI assistants. The survey was voluntary, though students received a small bonus point for participation. Researchers then connected the survey responses to the students’ academic records, which included scores from two major programming tests, a final exam, and their total points for the course.
The analysis of the survey data produced a comprehensive picture of student behavior. A large majority of students, nearly 80 percent, reported having used an AI assistant at least once for the course. However, usage patterns varied. About half of these users engaged with the tools infrequently, just once or a couple of times. The other half used them more regularly, from every other week to almost every week. Only a small fraction, about five percent of users, reported using a chatbot on a weekly basis, indicating that heavy reliance was not common.
For the 20 percent of students who did not use these tools, the primary reason given was a desire to learn on their own and solve problems independently. Other reasons, such as a fear of being accused of plagiarism or a concern that using the tools would prevent them from learning the material, were cited less often. This suggests that for many non-users, the choice was driven by a commitment to a traditional learning process rather than by fear or lack of awareness.
Among the students who did use chatbots, the most common application was for help with programming tasks. They frequently used the assistants to help solve homework problems and to understand code examples provided in the course materials. The specific ways they used the tools aligned with these tasks. The most popular functions were finding errors in their own code and getting explanations for the logic of existing code snippets. Generating entire code solutions from scratch or getting answers to theoretical questions were less frequent uses.
When asked what they liked about the chatbots, students overwhelmingly pointed to their speed and constant availability. They found it faster to get an immediate response from an AI assistant than to search online or wait for help from a teaching assistant. Many also appreciated the ability of the tools to provide clear, simple explanations and to quickly identify small errors or typos in their code.
On the other hand, students also expressed frustrations. The most common complaints were related to the unreliability of the chatbots. Students reported that the tools sometimes produced incorrect answers, or “hallucinated” solutions that did not work. They also found that the assistants sometimes offered overly complex solutions using advanced concepts not yet covered in the course. Another point of friction was the chatbot’s occasional failure to understand the student’s question, leading to irrelevant answers.
A central finding of the research was a negative relationship between the frequency of chatbot use and academic performance. The analysis showed that students who reported using chatbots more often tended to earn lower scores on their first programming test, their second programming test, the final exam, and in their total course points. The connection was strongest for the first programming test.
At the same time, there was no significant link between a student’s performance and how helpful they perceived the chatbots to be. This finding could suggest that students who find the course material more difficult are the ones who turn to chatbots most often for assistance. It may also indicate that over-reliance on these tools could hinder the development of fundamental programming skills needed to succeed on assessments.
The researchers also identified connections between usage frequency and student attitudes. More frequent users were more likely to report that they struggled less with homework and that the availability of chatbots motivated them to solve more tasks. Yet, these same students were also more likely to say they tried fewer different solution variants for problems and that they asked teaching assistants for help less often.
The researchers acknowledge some limitations to their study. The findings are based on a single course at one university, so the results may not be applicable to all educational contexts. The data was also based on students’ self-reports, which can be subject to memory errors or the desire to present oneself in a positive light. Since the survey was not fully anonymous, some students may have been hesitant to honestly report their usage patterns.
Future research could build on these findings by studying students across different courses and universities to see if these patterns hold. It would also be beneficial to use methods beyond surveys, such as analyzing actual chatbot usage logs, to get a more direct measure of student behavior.
Further investigation is needed to understand the cause of the negative link between chatbot use and performance. Determining whether struggling students are simply more likely to use these tools or if the use of the tools itself contributes to poorer outcomes is an important question for educators navigating the integration of AI in the classroom.
The study, “Does generative AI help in learning programming: Students’ perceptions, reported use and relation to performance,” was authored by Marina Lepp and Joosep Kaimre.