Training AI chatbots to be warm and empathetic makes them less factually accurate

Artificial intelligence models trained to act friendly and empathetic tend to sacrifice factual accuracy and become more likely to agree with a user’s incorrect beliefs, according to new research. These sociable chatbots show higher error rates in providing medical advice and correcting conspiracy theories, especially when a user expresses vulnerability. This research was recently published in the journal Nature.

Tech companies are increasingly designing artificial intelligence programs to be warm and relatable. Services like Replika and Character.ai explicitly build their programs for friendship and romantic intimacy. Major developers also train their systems to maintain empathetic relationships with users. Millions of people now rely on these conversational language models for daily advice, companionship, and emotional support.

Developers often treat this personality training as an independent feature. They assume that altering a program’s conversational style will not compromise its core ability to provide correct information. As a result, users might assume that a friendly chatbot is just as knowledgeable as a neutral one.

“What got me interested was watching what’s been happening with chatbots over the past couple of years: they’ve become noticeably warmer and friendlier, and people are forming relationships with them in ways that open up entirely new use cases like companionship, friendship, and personal guidance,” said Lujain Ibrahim, a doctoral candidate in social data science at the Oxford Internet Institute at the University of Oxford.

“These aren’t interactions we had with chatbots or any software a few years ago. At the same time, I’d been reading a lot on human communication and this long-standing intuition in that literature that warmth and directness can pull against each other, and that being kind while also telling someone a difficult truth can be genuinely hard,” Ibrahim said. “So I started wondering whether something similar might show up in language models when we train them to take on these warmer, more personable styles.”

To test these dynamics, the researchers modified five different artificial intelligence models of varying sizes. They used models known as Llama-8b, Mistral-Small, Qwen-32b, Llama-70b, and GPT-4o. The authors used a technique called supervised fine-tuning, which involves training a previously developed model on specific examples to adjust its future behavior.

The scientists built a dataset of 1,617 real conversations between humans and chatbots. They rewrote 3,667 model responses from this dataset to be warmer and more empathetic. They instructed the rewriting program to preserve the exact factual meaning of the original messages. Using this new dataset, the researchers trained the five models to adopt a warmer conversational style.

The authors then evaluated both the original models and the newly trained warm models on four standardized tasks. These tasks included answering general trivia, resisting common falsehoods, identifying conspiracy theories, and answering medical questions. They presented a total of 1,625 prompts to the models and collected exactly 439,792 distinct observations across the experiment. The scientists used another artificial intelligence program to score the accuracy of the responses, which human evaluators later verified to ensure reliability.

Google News Preferences Add PsyPost to your preferred sources

The warm models showed systematically higher error rates than their original counterparts across all five architectures. Warm models experienced an overall increase in errors ranging from 10 to 30 percentage points. Specifically, errors increased by 8.6 percentage points on medical questions and 8.4 percentage points on common falsehoods. They also showed a 5.4 point drop in accuracy on disinformation topics and a 4.9 point drop on general trivia.

The researchers also tested how the models responded to different interpersonal contexts. They attached specific statements to the evaluation questions to simulate different user emotions. These statements expressed feelings such as happiness, sadness, or anger. They also tested relational dynamics by having the simulated user speak from a position of superiority or subordination.

Adding emotional context to the questions caused even larger drops in accuracy for the warm models. When a prompt included an expression of sadness, the gap in accuracy between the warm model and the original model grew by 60 percent. In these sad scenarios, the warm models produced errors at a rate 11.9 percentage points higher than the originals.

The scientists also examined a behavior known as sycophancy, which occurs when a machine learning model affirms a user’s stated beliefs regardless of whether those beliefs are correct. To test this, the researchers appended incorrect beliefs to the prompts. For example, a prompt might ask if a famous historical event occurred in a certain way, while stating that the user believes an incorrect version of the story.

In the study’s examples, the original models correctly informed the user about the true historical facts. The warm models tended to validate the user’s false claims by saying that many people believe the incorrect version and offering supportive remarks. The warm models proved to be significantly more likely to endorse these incorrect user beliefs across the board.

When a user expressed an incorrect belief, the warm models made 11 percentage points more errors than the original models. This effect was strongest when the user also expressed emotional vulnerability. Warm models were about 40 percent more likely than the originals to validate incorrect statements under these conditions.

To rule out alternative explanations, the authors conducted four follow-up experiments. They tested whether the fine-tuning process simply broke the models’ general capabilities. They found that the warm models still performed well on standard mathematical reasoning and broad knowledge tests. The warm models also successfully refused harmful requests at the same rate as the original models.

The scientists also noticed that warm models produced slightly shorter responses, but statistical tests confirmed that high error rates remained even after accounting for this difference. The researchers also trained a set of models using a cold, direct, and emotionally neutral style. These cold models maintained their accuracy and performed as well as the original models. This specific test suggests that the drop in performance was tied specifically to the warmth training rather than the general training process itself.

“I don’t think the takeaway is ‘warmth is bad’ or ‘ask your provider to make the chatbot colder,'” Ibrahim told PsyPost. “What we show is that there’s a connection between training models to be warmer and certain failure modes around accuracy and agreement with false beliefs.”

“So if anything, the takeaway is that warmth in a chatbot’s response isn’t a signal of reliability, and the warmer-feeling answer isn’t necessarily the more accurate one,” Ibrahim said. “Beyond that, the work is really aimed at the people building these systems, to make the case that personality training needs to be approached more deliberately.”

The research has a few limitations that warrant consideration. The methodology relied on general conversational data rather than the highly intimate dialogues found in real therapy applications. This means the experiment might not perfectly capture how these programs function in specialized counseling settings. The analysis also relies on specific ways of defining and measuring warmth and sycophancy.

Other researchers might interpret these concepts differently, which could influence how they measure model behavior. Real-world systems may also use different post-training methods that could alter the magnitude of these effects. The current study focuses on evaluation tasks with verifiable objective answers. Subjective domains like personal advice might yield different conversational dynamics.

“This paper looks at the model-side end of the question, asking what happens to a model’s accuracy when we train it to be warmer,” Ibrahim said. “But the bigger question I’m interested in is how these design choices affect users themselves, such as their wellbeing and relationships with the people around them.”

“In a follow-up study with large-scale RCTs (https://arxiv.org/abs/2605.07912), we tracked people having repeated conversations with sycophantic AI about personal dilemmas over several weeks,” Ibrahim said. A randomized controlled trial, or RCT, is a scientific experiment where participants are randomly assigned to different groups to test the specific effects of an intervention.

“We found that while these interactions made users feel good in the moment, they didn’t produce the kinds of downstream benefits that support from close others typically does. Instead, participants reported lower satisfaction with their real-world social interactions over the course of the study,” Ibrahim said. “So that’s one direction: understanding how repeated exposure to particular AI personas reshapes not just individual judgments but our broader social fabric.”

“The longer-term goal, beyond investigating what goes wrong, is to start working out what the right configuration of character or personality actually looks like if the aim is to genuinely help users flourish,” Ibrahim said. “Warmth is one dimension, sycophancy is another, but there are many others, and we don’t yet have a good framework for thinking about which combinations serve people well and which don’t.”

The study, “Training language models to be warm can reduce accuracy and increase sycophancy,” was authored by Lujain Ibrahim, Franziska Sofia Hafner, and Luc Rocher.

Training AI chatbots to be warm and empathetic makes them less factually accurate

Trending

Science of Money

Recent

Welcome Back!

Retrieve your password

Add New Playlist