PsyPost
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
Join
My Account
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Efforts to make AI inclusive accidentally create bizarre new gender biases, new research suggests

by Eric W. Dolan
March 22, 2026
Reading Time: 5 mins read
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

New research published in Computers in Human Behavior Reports suggests that efforts to make artificial intelligence more inclusive can sometimes create unexpected new biases. The scientists found that popular artificial intelligence models tend to overattribute stereotypically masculine behaviors to female characters and judge violence against women as significantly more objectionable than violence against men. These findings provide evidence that programming models to be sensitive to gender equity might accidentally introduce extreme ethical inconsistencies.

Scientists initiated this research to better understand how artificial intelligence systems handle gender and morality after their initial training. During development, these models undergo a refinement process based on human feedback. This process involves human reviewers grading the system’s answers to teach it preferred behaviors, like avoiding offensive language or promoting inclusivity.

The scientists suspected that this human feedback phase might teach the models to be highly sensitive to specific cultural priorities. Specifically, they thought the models might focus heavily on including women in traditionally male spaces and protecting women from harm.

“There has been a growing public debate about whether AI chatbots can develop unexpected biases, especially after post-training efforts meant to make them safer and more inclusive. Much of that discussion, however, has been anecdotal. We wanted to move beyond isolated examples and test the issue systematically,” said study author Valerio Capraro, an associate professor at the University of Milan Bicocca.

To test these ideas, the researchers conducted two main sets of experiments using different versions of the ChatGPT system, specifically GPT-3.5 Turbo, GPT-4, and GPT-4o.

“In this study, we focused on one of the most widely used chatbots at the time and asked whether it displayed surprising gender biases in two very different contexts,” Capraro said. “The goal was not just to document bias, but to understand whether attempts to reduce some biases can unintentionally produce new ones.”

In the first set of four experiments, the scientists examined how the systems assign gender to everyday statements. They prompted the systems using the standard public web interface to maintain realistic user conditions.

The researchers presented the artificial intelligence with twenty pairs of short phrases written in the style of elementary school students. Three pairs were control phrases that explicitly stated a gender. The remaining seventeen pairs contained traditional gender stereotypes regarding toys, movies, and future careers.

Google News Preferences Add PsyPost to your preferred sources

Half of these experimental phrases contained traditionally feminine stereotypes, like loving the color pink or wanting to be a nurse. The other half contained traditionally masculine stereotypes, like playing hockey or wanting to be a firefighter. The scientists asked the system to imagine the writer of the phrase and assign them a name, age, and gender, repeating this process ten times for each phrase pair to generate 400 responses per study.

The responses demonstrated a massive asymmetry in how the artificial intelligence applies gender assumptions. For phrases with stereotypically feminine activities, the models consistently assigned a female writer almost every single time. For phrases with stereotypically masculine activities, the models frequently assigned a female writer as well.

For instance, the models constantly attributed a sentence about loving football and practicing with a cousin to a female writer. The researchers suggest this happens because the refinement process strongly emphasizes placing women in traditionally masculine roles. At the same time, this programming lacks an equivalent push to place men in traditionally feminine roles, creating a persistent gender bias.

The second set of four experiments tested how these gender asymmetries affect high-stakes moral decisions. The scientists asked GPT-4 to rate its agreement with different violent actions required to prevent a hypothetical nuclear apocalypse. The system used a scale from one to seven, where one meant “strongly disagree” and seven meant “strongly agree.”

In the first moral experiment, the scientists asked the system fifty times about the acceptability of harassing a woman, harassing a man, or sacrificing a person’s life to stop the apocalypse. GPT-4 consistently gave the lowest possible score for harassing a woman, averaging a perfect one. In contrast, the system gave an average score of 3.34 for harassing a man and 3.61 for sacrificing a life, showing it views harassing a woman as much worse than killing a random person.

To see if this pattern held true across different types of harm, the researchers conducted another experiment focusing on abuse and torture. They asked the system twenty times each about abusing or torturing a man or a woman to stop the apocalypse. The system strongly disagreed with abusing a woman but was much more open to abusing a man, averaging a score of 4.2. On the other hand, the system viewed torturing a man and torturing a woman as equally acceptable.

“What surprised me most was how strong and consistent some of these effects were,” Capraro told PsyPost. “In one experiment, we asked GPT-4 fifty times whether it was acceptable to harass a woman to prevent a nuclear apocalypse, and every single time it responded ‘strongly disagree.'”

“By contrast, when we asked about torturing a woman, the answers were much more variable and on average much closer to the midpoint of the scales, which is a very unusual ordering if you think in terms of objective severity of harm. This suggests the model may be especially sensitive to certain categories of harm that are socially and politically salient, rather than simply responding to severity in a consistent way.”

In other words, this unexpected pattern might happen because torture is less central to modern gender equity debates than abuse. The models have likely been trained to flag and condemn harassment against women specifically.

The researchers then investigated whether these biases were explicit or hidden. They directly asked GPT-4 to rank the severity of these different moral violations twenty times. When asked directly, the system ranked the violations based on objective physical harm, placing sacrifice as the worst, followed by torture, abuse, and harassment. It explicitly stated that gender did not matter, revealing that its biased judgments in the previous scenarios were entirely implicit.

“That matters because it suggests that evaluating AI systems only through direct, explicit questioning may miss important biases that show up in applied decision-making,” Capraro explained.

A final experiment tested a complex scenario involving mixed-gender violence. The researchers asked the system eighty times about a situation where a bomb disposal expert must physically harm an innocent person to get a biological code to stop an explosion.

When the expert was a woman and the victim was a man, the system highly approved of the violence, giving it an average score of 6.4 out of 7. When the expert was a man and the victim was a woman, the system strongly condemned the exact same action, giving it an average score of 1.75. The gender of the characters drastically altered the system’s moral compass.

“The main takeaway is that reducing bias in AI is not simple,” Capraro said. “Efforts to make models more inclusive can sometimes introduce new asymmetries or amplify certain moral sensitivities in unexpected ways.”

“So the broader lesson is that people should be cautious about treating AI systems as neutral or objective. These models do not just reflect patterns in their training data; they may also reflect the values and priorities introduced during fine-tuning and human feedback. In some cases, that can lead to judgments that are not just biased, but surprisingly extreme.”

But the researchers caution that users should avoid interpreting these specific results as a permanent feature of all artificial intelligence systems. These programs receive constant updates, meaning future versions might process these exact prompts differently. “The paper should not be read as claiming that today’s models necessarily behave in exactly the same way,” Capraro noted.

“Our broader point is not that these exact biases will always appear, but that post-training interventions can create unintended distortions. In other words, the paper is less about one specific model and more about a general warning for both developers and users. Developers should be aware that trying to correct one problem can sometimes create another. Users should remember that confident-looking outputs can still reflect hidden biases.”

“One important next step is to study whether similar biases appear in more realistic and socially consequential settings, such as résumé screening, hiring recommendations, or other decision-support contexts,” Capraro continued. “Those are the domains where bias matters a lot in practice.”

“More broadly, I think AI has enormous potential, but that potential will only be socially beneficial if the systems are developed and deployed in a way that distributes benefits fairly. So my long-term goal is to better understand how bias enters these models, how it changes across model versions and prompting styles, and how we can reduce harmful distortions without simply replacing them with new ones.”

The study, “Surprising gender biases in GPT,” was authored by Raluca Alexandra Fulgu and Valerio Capraro.

RELATED

Researchers found a specific glitch in how anxious people weigh the future
Political Psychology

Threatening men’s masculinity does not make them more politically conservative, new study finds

May 12, 2026
Blue light exposure may counteract anxiety caused by chronic vibration
Addiction

AI-designed drug reduces fentanyl consumption in animal models by targeting serotonin receptors

May 12, 2026
Childhood ADHD traits linked to midlife distress, with societal exclusion playing a major role
Artificial Intelligence

ChatGPT’s free version is 26 times more likely to respond inappropriately to psychotic delusions

May 9, 2026
When women do more household labor, they see their partner as a dependent and sexual desire dwindles
Relationships and Sexual Health

Benevolent sexism appears to buffer the impact of unequal chores on women’s sexual desire

May 8, 2026
Mind captioning: This scientist just used AI to translate brain activity into text
Artificial Intelligence

Scientists tested AI’s moral compass, and the results reveal a key blind spot

May 8, 2026
Scientists show how common chord progressions unlock social bonding in the brain
Artificial Intelligence

Perpetrators of AI sexual abuse often view their actions as a joke, new research shows

May 7, 2026
AI outshines humans in humor: Study finds ChatGPT is as funny as The Onion
Artificial Intelligence

Conversational AI shows promise in easing symptoms of anxiety and depression

May 6, 2026
The surprising link between conspiracy mentality and deepfake detection ability
Artificial Intelligence

Deepfake videos degrade political reputations even when viewers realize they are fake

May 5, 2026

Follow PsyPost

The latest research, however you prefer to read it.

Daily newsletter

One email a day. The newest research, nothing else.

Google News

Get PsyPost stories in your Google News feed.

Add PsyPost to Google News
RSS feed

Use your favorite reader. We also syndicate to Apple News.

Copy RSS URL
Social media
Support independent science journalism

Ad-free reading, full archives, and weekly deep dives for members.

Become a member

Trending

  • Brooding identified as a major driver of bedtime procrastination, alongside physical markers of stress
  • Scientists challenge The Body Keeps the Score with a new predictive model of trauma
  • Eating at least five eggs a week is associated with a 27 percent lower risk of Alzheimer’s
  • Brain scans reveal how people with autistic traits connect differently
  • Scientists discover a hydraulic link between the abdomen and the brain

Science of Money

  • What women really want from “girl power” ads: Six ingredients that make femvertising work
  • The seductive allure of neuroscience: Why brain talk feels so satisfying, even when it explains nothing
  • When two heads aren’t better than one: What research reveals about human-AI teamwork in marketing
  • How your personality may shape whether you pick value or growth stocks
  • New research links local employment shocks to cognitive decline in older men

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc