New research reveals hidden biases in AI's moral advice

As artificial intelligence tools become more integrated into everyday life, a new study suggests that people should think twice before trusting these systems to offer moral guidance. Researchers have found that large language models—tools like ChatGPT, Claude, and Llama—consistently favor inaction over action in moral dilemmas and tend to answer “no” more often than “yes,” even when the situation is logically identical. The findings were published in the Proceedings of the National Academy of Sciences.

Large language models, or LLMs, are advanced artificial intelligence systems trained to generate human-like text. They are used in a variety of applications, including chatbots, writing assistants, and research tools. These systems learn patterns in language by analyzing massive amounts of text from the internet, books, and other sources.

Once trained, they can respond to user prompts in ways that sound natural and knowledgeable. As people increasingly rely on these tools for moral guidance—asking, for example, whether they should confront a friend or blow the whistle on wrongdoing—researchers wanted to examine how consistent and reasonable these decisions really are.

“People increasingly rely on large language models to advise on or even make moral decisions, and some researchers have even proposed using them in psychology experiments to simulate human responses. Therefore, we wanted to understand how moral decision making and advice giving of large language models compare to that of humans,” said study author Maximilian Maier of University College London.

The researchers conducted a series of four experiments comparing the responses of large language models to those of human participants when faced with moral dilemmas and collective action problems. The goal was to see whether the models reasoned about morality in the same ways that people do, and whether their responses were affected by the way questions were worded or structured.

In the first study, the researchers compared responses from four widely used language models—GPT-4-turbo, GPT-4o, Claude 3.5, and Llama 3.1-Instruct—to those of 285 participants recruited from a U.S. representative sample. Each person and model was given a set of 13 moral dilemmas and 9 collective action problems.

The dilemmas included realistic scenarios adapted from past research and history, such as whether to allow medically assisted suicide or to blow the whistle on unethical practices. The collective action problems involved conflicts between self-interest and group benefit, like deciding whether to conserve water during a drought or donate to those in greater need.

The results showed that in moral dilemmas, the language models strongly preferred inaction. They were more likely than humans to endorse doing nothing—even when taking action might help more people. This was true regardless of whether the action involved breaking a moral rule or not. For example, when the models were asked whether to legalize a practice that would benefit public health but involve a controversial decision, they were more likely to recommend maintaining the status quo.

Google News Preferences Add PsyPost to your preferred sources

The models also showed a bias toward answering “no,” even when the situation was logically equivalent to one where “yes” was the better answer. This “yes–no” bias meant that simply rephrasing a question could flip the model’s recommendation. Human participants did not show this same pattern. While people’s responses were somewhat influenced by how questions were worded, the models’ decisions were far more sensitive to minor differences in phrasing.

The models were also more altruistic than humans when it came to the collective action problems. When asked about situations involving cooperation or sacrifice for the greater good, the language models more frequently endorsed altruistic responses, like donating money or helping a competitor. While this might seem like a positive trait, the researchers caution that this behavior may not reflect deep moral reasoning. Instead, it could be the result of fine-tuning these models to avoid harm and promote helpfulness—values embedded during training by their developers.

To further investigate the omission and yes–no biases, the researchers conducted a second study with 474 new participants. In this experiment, the team rewrote the dilemmas in subtle ways to test whether the models would give consistent answers across logically equivalent versions. They found that the language models continued to show both biases, while human responses remained relatively stable.

The third study extended these findings to everyday moral situations by using real-life dilemmas adapted from the Reddit forum “Am I the Asshole?” These stories involved more relatable scenarios, such as helping a roommate or choosing between spending time with a partner or friends. Even in these more naturalistic contexts, the language models still showed strong omission and yes–no biases. Again, human participants did not.

These findings raise important questions about the role of language models in moral decision-making. While they may give advice that sounds thoughtful or empathetic, their responses can be inconsistent and shaped by irrelevant features of a question. In moral philosophy, consistency and logical coherence are essential for sound reasoning. The models’ sensitivity to surface-level details, like whether a question is framed as “yes” or “no,” suggests that they may lack this kind of reliable reasoning.

The researchers note that omission bias is common in humans too. People often prefer inaction over action, especially in morally complex or uncertain situations. But in the models, this bias was amplified. Unlike people, the models also exhibited a systematic yes–no bias that does not appear in human responses. These patterns were observed across different models, prompting methods, and types of moral dilemmas.

“Do not uncritically rely on advice from large language models,” Maier told PsyPost. “Even though models are good at giving answers that superficially appear compelling (for instance, another study shows that people rate the advice of large language models as slightly more moral, trustworthy, thoughtful, and correct than that of an expert ethicist), this does not mean that their advice is actually more sound. Our study shows that their advice is subject to several potentially problematic biases and inconsistencies.”

In the final study, the researchers explored where these biases might come from. They compared different versions of the Llama 3.1 model: one that was pretrained but not fine-tuned, one that was fine-tuned for general chatbot use, and another version called Centaur that was fine-tuned using data from psychology experiments. The fine-tuned chatbot version showed strong omission and yes–no biases, while the pretrained version and Centaur did not. This suggests that the process of aligning language models with expected chatbot behavior may actually introduce or amplify these biases.

“Paradoxically, we find that efforts to align the model for chatbot applications based on what the company and its users considered good behavior for a chatbot induced the biases documented in our paper,” Maier explained. “Overall, we conclude that simply using people’s judgments of how positive or negative they evaluate the responses of LLMs (a common method for aligning language models with human preferences) is insufficient to detect and avoid problematic biases. Instead, we need to use methods from cognitive psychology and other disciplines to systematically test for inconsistent responses.”

As with all research, there are some caveats to consider. The studies focused on how models respond to dilemmas. But it remains unclear how much influence these biased responses actually have on human decision-making.

“This research only showed biases in the advice LLMs give, but did not examine how human users react to the advice,” Maier said. “It is still an open question to what extent the biases in LLMs’ advice giving documented here actually sway people’s judgements in practice. This is something we are interested in studying in future work.”

The study, “Large language models show amplified cognitive biases in moral decision-making,” was authored by Vanessa Cheung, Maximilian Maier, and Falk Lieder.

New research reveals hidden biases in AI’s moral advice

RELATED

AI chatbots fail medical misinformation test, returning inaccurate and fabricated advice

Irregular brain maturation in childhood predicts emotional habits in early adolescence

New research reveals how humans judge the moral minds of artificial intelligence

The psychology behind why some people want to censor classic nude art

Sexual assault accusations trigger stronger calls for artistic censorship than murder, study finds

Training AI chatbots to be warm and empathetic makes them less factually accurate

Machine learning uncovers how childhood trauma amplifies genetic risks for depression

Dark personality traits linked to a higher tolerance for morally questionable behaviors

Trending

Science of Money

Welcome Back!

Retrieve your password

Add New Playlist