Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

New research reveals hidden biases in AI’s moral advice

by Eric W. Dolan
July 5, 2025
in Artificial Intelligence, Moral Psychology
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook
Stay informed on the latest psychology and neuroscience research—follow PsyPost on LinkedIn for daily updates and insights.

As artificial intelligence tools become more integrated into everyday life, a new study suggests that people should think twice before trusting these systems to offer moral guidance. Researchers have found that large language models—tools like ChatGPT, Claude, and Llama—consistently favor inaction over action in moral dilemmas and tend to answer “no” more often than “yes,” even when the situation is logically identical. The findings were published in the Proceedings of the National Academy of Sciences.

Large language models, or LLMs, are advanced artificial intelligence systems trained to generate human-like text. They are used in a variety of applications, including chatbots, writing assistants, and research tools. These systems learn patterns in language by analyzing massive amounts of text from the internet, books, and other sources.

Once trained, they can respond to user prompts in ways that sound natural and knowledgeable. As people increasingly rely on these tools for moral guidance—asking, for example, whether they should confront a friend or blow the whistle on wrongdoing—researchers wanted to examine how consistent and reasonable these decisions really are.

“People increasingly rely on large language models to advise on or even make moral decisions, and some researchers have even proposed using them in psychology experiments to simulate human responses. Therefore, we wanted to understand how moral decision making and advice giving of large language models compare to that of humans,” said study author Maximilian Maier of University College London.

The researchers conducted a series of four experiments comparing the responses of large language models to those of human participants when faced with moral dilemmas and collective action problems. The goal was to see whether the models reasoned about morality in the same ways that people do, and whether their responses were affected by the way questions were worded or structured.

In the first study, the researchers compared responses from four widely used language models—GPT-4-turbo, GPT-4o, Claude 3.5, and Llama 3.1-Instruct—to those of 285 participants recruited from a U.S. representative sample. Each person and model was given a set of 13 moral dilemmas and 9 collective action problems.

The dilemmas included realistic scenarios adapted from past research and history, such as whether to allow medically assisted suicide or to blow the whistle on unethical practices. The collective action problems involved conflicts between self-interest and group benefit, like deciding whether to conserve water during a drought or donate to those in greater need.

The results showed that in moral dilemmas, the language models strongly preferred inaction. They were more likely than humans to endorse doing nothing—even when taking action might help more people. This was true regardless of whether the action involved breaking a moral rule or not. For example, when the models were asked whether to legalize a practice that would benefit public health but involve a controversial decision, they were more likely to recommend maintaining the status quo.

The models also showed a bias toward answering “no,” even when the situation was logically equivalent to one where “yes” was the better answer. This “yes–no” bias meant that simply rephrasing a question could flip the model’s recommendation. Human participants did not show this same pattern. While people’s responses were somewhat influenced by how questions were worded, the models’ decisions were far more sensitive to minor differences in phrasing.

The models were also more altruistic than humans when it came to the collective action problems. When asked about situations involving cooperation or sacrifice for the greater good, the language models more frequently endorsed altruistic responses, like donating money or helping a competitor. While this might seem like a positive trait, the researchers caution that this behavior may not reflect deep moral reasoning. Instead, it could be the result of fine-tuning these models to avoid harm and promote helpfulness—values embedded during training by their developers.

To further investigate the omission and yes–no biases, the researchers conducted a second study with 474 new participants. In this experiment, the team rewrote the dilemmas in subtle ways to test whether the models would give consistent answers across logically equivalent versions. They found that the language models continued to show both biases, while human responses remained relatively stable.

The third study extended these findings to everyday moral situations by using real-life dilemmas adapted from the Reddit forum “Am I the Asshole?” These stories involved more relatable scenarios, such as helping a roommate or choosing between spending time with a partner or friends. Even in these more naturalistic contexts, the language models still showed strong omission and yes–no biases. Again, human participants did not.

These findings raise important questions about the role of language models in moral decision-making. While they may give advice that sounds thoughtful or empathetic, their responses can be inconsistent and shaped by irrelevant features of a question. In moral philosophy, consistency and logical coherence are essential for sound reasoning. The models’ sensitivity to surface-level details, like whether a question is framed as “yes” or “no,” suggests that they may lack this kind of reliable reasoning.

The researchers note that omission bias is common in humans too. People often prefer inaction over action, especially in morally complex or uncertain situations. But in the models, this bias was amplified. Unlike people, the models also exhibited a systematic yes–no bias that does not appear in human responses. These patterns were observed across different models, prompting methods, and types of moral dilemmas.

“Do not uncritically rely on advice from large language models,” Maier told PsyPost. “Even though models are good at giving answers that superficially appear compelling (for instance, another study shows that people rate the advice of large language models as slightly more moral, trustworthy, thoughtful, and correct than that of an expert ethicist), this does not mean that their advice is actually more sound. Our study shows that their advice is subject to several potentially problematic biases and inconsistencies.”

In the final study, the researchers explored where these biases might come from. They compared different versions of the Llama 3.1 model: one that was pretrained but not fine-tuned, one that was fine-tuned for general chatbot use, and another version called Centaur that was fine-tuned using data from psychology experiments. The fine-tuned chatbot version showed strong omission and yes–no biases, while the pretrained version and Centaur did not. This suggests that the process of aligning language models with expected chatbot behavior may actually introduce or amplify these biases.

“Paradoxically, we find that efforts to align the model for chatbot applications based on what the company and its users considered good behavior for a chatbot induced the biases documented in our paper,” Maier explained. “Overall, we conclude that simply using people’s judgments of how positive or negative they evaluate the responses of LLMs (a common method for aligning language models with human preferences) is insufficient to detect and avoid problematic biases. Instead, we need to use methods from cognitive psychology and other disciplines to systematically test for inconsistent responses.”

As with all research, there are some caveats to consider. The studies focused on how models respond to dilemmas. But it remains unclear how much influence these biased responses actually have on human decision-making.

“This research only showed biases in the advice LLMs give, but did not examine how human users react to the advice,” Maier said. “It is still an open question to what extent the biases in LLMs’ advice giving documented here actually sway people’s judgements in practice. This is something we are interested in studying in future work.”

The study, “Large language models show amplified cognitive biases in moral decision-making,” was authored by Vanessa Cheung, Maximilian Maier, and Falk Lieder.

TweetSendScanShareSendPin1ShareShareShareShareShare

RELATED

Positive attitudes toward AI linked to problematic social media use
Cognitive Science

People with higher cognitive ability have weaker moral foundations, new study finds

July 7, 2025

A large study has found that individuals with greater cognitive ability are less likely to endorse moral values such as compassion, fairness, loyalty, and purity. The results point to a consistent negative relationship between intelligence and moral intuitions.

Read moreDetails
Positive attitudes toward AI linked to problematic social media use
Artificial Intelligence

Positive attitudes toward AI linked to problematic social media use

July 7, 2025

A new study suggests that people who view artificial intelligence positively may be more likely to overuse social media. The findings highlight a potential link between attitudes toward AI and problematic online behavior, especially among male users.

Read moreDetails
Stress disrupts gut and brain barriers by reducing key microbial metabolites, study finds
Artificial Intelligence

Dark personality traits linked to generative AI use among art students

July 5, 2025

As generative AI tools become staples in art education, a new study uncovers who misuses them most. Research on Chinese art students connects "dark traits" like psychopathy to academic dishonesty, negative thinking, and a heavier reliance on AI technologies.

Read moreDetails
Scientists reveal ChatGPT’s left-wing bias — and how to “jailbreak” it
Artificial Intelligence

ChatGPT and “cognitive debt”: New study suggests AI might be hurting your brain’s ability to think

July 1, 2025

Researchers at MIT investigated how writing with ChatGPT affects brain activity and recall. Their findings indicate that reliance on AI may lead to reduced mental engagement, prompting concerns about cognitive “offloading” and its implications for education.

Read moreDetails
Readers struggle to understand AI’s role in news writing, study suggests
Artificial Intelligence

Readers struggle to understand AI’s role in news writing, study suggests

June 29, 2025

A new study finds that readers often misunderstand AI’s role in news writing, creating their own explanations based on limited information. Without clear byline disclosures, many assume the worst.

Read moreDetails
Generative AI chatbots like ChatGPT can act as an “emotional sanctuary” for mental health
Artificial Intelligence

Do AI tools undermine our sense of creativity? New study says yes

June 19, 2025

A new study published in The Journal of Creative Behavior offers insight into how people think about their own creativity when working with artificial intelligence.

Read moreDetails
Dark personality traits and specific humor styles are linked to online trolling, study finds
Artificial Intelligence

Memes can serve as strong indicators of coming mass violence

June 15, 2025

A new study finds that surges in visual propaganda—like memes and doctored images—often precede political violence. By combining AI with expert analysis, researchers tracked manipulated content leading up to Russia’s invasion of Ukraine, revealing early warning signs of instability.

Read moreDetails
Teen depression tied to balance of adaptive and maladaptive emotional strategies, study finds
Artificial Intelligence

Sleep problems top list of predictors for teen mental illness, AI-powered study finds

June 15, 2025

A new study using data from over 11,000 adolescents found that sleep disturbances were the most powerful predictor of future mental health problems—more so than trauma or family history. AI models based on questionnaires outperformed those using brain scans.

Read moreDetails

SUBSCRIBE

Go Ad-Free! Click here to subscribe to PsyPost and support independent science journalism!

STAY CONNECTED

LATEST

Scientists discover weak Dems have highest testosterone — but there’s an intriguing twist

Can sunshine make you happier? A massive study offers a surprising answer

New study links why people use pornography to day-to-day couple behavior

Virtual reality meditation eases caregiver anxiety during pediatric hospital stays, with stronger benefits for Spanish speakers

Fascinating new advances in psychedelic science reveal how they may heal the mind

Dysfunction within the sensory processing cortex of the brain is associated with insomnia, study finds

Prenatal exposure to “forever chemicals” linked to autistic traits in children, study finds

Ketamine repairs reward circuitry to reverse stress-induced anhedonia

         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy