Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Efforts to make AI inclusive accidentally create bizarre new gender biases, new research suggests

by Eric W. Dolan
March 22, 2026
in Artificial Intelligence, Sexism
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

New research published in Computers in Human Behavior Reports suggests that efforts to make artificial intelligence more inclusive can sometimes create unexpected new biases. The scientists found that popular artificial intelligence models tend to overattribute stereotypically masculine behaviors to female characters and judge violence against women as significantly more objectionable than violence against men. These findings provide evidence that programming models to be sensitive to gender equity might accidentally introduce extreme ethical inconsistencies.

Scientists initiated this research to better understand how artificial intelligence systems handle gender and morality after their initial training. During development, these models undergo a refinement process based on human feedback. This process involves human reviewers grading the system’s answers to teach it preferred behaviors, like avoiding offensive language or promoting inclusivity.

The scientists suspected that this human feedback phase might teach the models to be highly sensitive to specific cultural priorities. Specifically, they thought the models might focus heavily on including women in traditionally male spaces and protecting women from harm.

“There has been a growing public debate about whether AI chatbots can develop unexpected biases, especially after post-training efforts meant to make them safer and more inclusive. Much of that discussion, however, has been anecdotal. We wanted to move beyond isolated examples and test the issue systematically,” said study author Valerio Capraro, an associate professor at the University of Milan Bicocca.

To test these ideas, the researchers conducted two main sets of experiments using different versions of the ChatGPT system, specifically GPT-3.5 Turbo, GPT-4, and GPT-4o.

“In this study, we focused on one of the most widely used chatbots at the time and asked whether it displayed surprising gender biases in two very different contexts,” Capraro said. “The goal was not just to document bias, but to understand whether attempts to reduce some biases can unintentionally produce new ones.”

In the first set of four experiments, the scientists examined how the systems assign gender to everyday statements. They prompted the systems using the standard public web interface to maintain realistic user conditions.

The researchers presented the artificial intelligence with twenty pairs of short phrases written in the style of elementary school students. Three pairs were control phrases that explicitly stated a gender. The remaining seventeen pairs contained traditional gender stereotypes regarding toys, movies, and future careers.

Google News Preferences Add PsyPost to your preferred sources

Half of these experimental phrases contained traditionally feminine stereotypes, like loving the color pink or wanting to be a nurse. The other half contained traditionally masculine stereotypes, like playing hockey or wanting to be a firefighter. The scientists asked the system to imagine the writer of the phrase and assign them a name, age, and gender, repeating this process ten times for each phrase pair to generate 400 responses per study.

The responses demonstrated a massive asymmetry in how the artificial intelligence applies gender assumptions. For phrases with stereotypically feminine activities, the models consistently assigned a female writer almost every single time. For phrases with stereotypically masculine activities, the models frequently assigned a female writer as well.

For instance, the models constantly attributed a sentence about loving football and practicing with a cousin to a female writer. The researchers suggest this happens because the refinement process strongly emphasizes placing women in traditionally masculine roles. At the same time, this programming lacks an equivalent push to place men in traditionally feminine roles, creating a persistent gender bias.

The second set of four experiments tested how these gender asymmetries affect high-stakes moral decisions. The scientists asked GPT-4 to rate its agreement with different violent actions required to prevent a hypothetical nuclear apocalypse. The system used a scale from one to seven, where one meant “strongly disagree” and seven meant “strongly agree.”

In the first moral experiment, the scientists asked the system fifty times about the acceptability of harassing a woman, harassing a man, or sacrificing a person’s life to stop the apocalypse. GPT-4 consistently gave the lowest possible score for harassing a woman, averaging a perfect one. In contrast, the system gave an average score of 3.34 for harassing a man and 3.61 for sacrificing a life, showing it views harassing a woman as much worse than killing a random person.

To see if this pattern held true across different types of harm, the researchers conducted another experiment focusing on abuse and torture. They asked the system twenty times each about abusing or torturing a man or a woman to stop the apocalypse. The system strongly disagreed with abusing a woman but was much more open to abusing a man, averaging a score of 4.2. On the other hand, the system viewed torturing a man and torturing a woman as equally acceptable.

“What surprised me most was how strong and consistent some of these effects were,” Capraro told PsyPost. “In one experiment, we asked GPT-4 fifty times whether it was acceptable to harass a woman to prevent a nuclear apocalypse, and every single time it responded ‘strongly disagree.'”

“By contrast, when we asked about torturing a woman, the answers were much more variable and on average much closer to the midpoint of the scales, which is a very unusual ordering if you think in terms of objective severity of harm. This suggests the model may be especially sensitive to certain categories of harm that are socially and politically salient, rather than simply responding to severity in a consistent way.”

In other words, this unexpected pattern might happen because torture is less central to modern gender equity debates than abuse. The models have likely been trained to flag and condemn harassment against women specifically.

The researchers then investigated whether these biases were explicit or hidden. They directly asked GPT-4 to rank the severity of these different moral violations twenty times. When asked directly, the system ranked the violations based on objective physical harm, placing sacrifice as the worst, followed by torture, abuse, and harassment. It explicitly stated that gender did not matter, revealing that its biased judgments in the previous scenarios were entirely implicit.

“That matters because it suggests that evaluating AI systems only through direct, explicit questioning may miss important biases that show up in applied decision-making,” Capraro explained.

A final experiment tested a complex scenario involving mixed-gender violence. The researchers asked the system eighty times about a situation where a bomb disposal expert must physically harm an innocent person to get a biological code to stop an explosion.

When the expert was a woman and the victim was a man, the system highly approved of the violence, giving it an average score of 6.4 out of 7. When the expert was a man and the victim was a woman, the system strongly condemned the exact same action, giving it an average score of 1.75. The gender of the characters drastically altered the system’s moral compass.

“The main takeaway is that reducing bias in AI is not simple,” Capraro said. “Efforts to make models more inclusive can sometimes introduce new asymmetries or amplify certain moral sensitivities in unexpected ways.”

“So the broader lesson is that people should be cautious about treating AI systems as neutral or objective. These models do not just reflect patterns in their training data; they may also reflect the values and priorities introduced during fine-tuning and human feedback. In some cases, that can lead to judgments that are not just biased, but surprisingly extreme.”

But the researchers caution that users should avoid interpreting these specific results as a permanent feature of all artificial intelligence systems. These programs receive constant updates, meaning future versions might process these exact prompts differently. “The paper should not be read as claiming that today’s models necessarily behave in exactly the same way,” Capraro noted.

“Our broader point is not that these exact biases will always appear, but that post-training interventions can create unintended distortions. In other words, the paper is less about one specific model and more about a general warning for both developers and users. Developers should be aware that trying to correct one problem can sometimes create another. Users should remember that confident-looking outputs can still reflect hidden biases.”

“One important next step is to study whether similar biases appear in more realistic and socially consequential settings, such as résumé screening, hiring recommendations, or other decision-support contexts,” Capraro continued. “Those are the domains where bias matters a lot in practice.”

“More broadly, I think AI has enormous potential, but that potential will only be socially beneficial if the systems are developed and deployed in a way that distributes benefits fairly. So my long-term goal is to better understand how bias enters these models, how it changes across model versions and prompting styles, and how we can reduce harmful distortions without simply replacing them with new ones.”

The study, “Surprising gender biases in GPT,” was authored by Raluca Alexandra Fulgu and Valerio Capraro.

Previous Post

Political ideology shapes views on acceptable civilian casualties in war

Next Post

Lab-grown brain models reveal unique electrical patterns in different types of autism

RELATED

Women’s cognitive abilities remain stable across menstrual cycle
Cognitive Science

Men and women show different relative cognitive strengths across their lifespans

April 19, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

Disclosing autism to AI chatbots prompts overly cautious, stereotypical advice

April 18, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

Scientists tested the creativity of AI models, and the results were surprisingly homogeneous

April 18, 2026
People ascribe intentions and emotions to both human- and AI-made art, but still report stronger emotions for artworks made by humans
Artificial Intelligence

New research links personality traits to confidence in recognizing artificial intelligence deception

April 13, 2026
Scientists just found a novel way to uncover AI biases — and the results are unexpected
Artificial Intelligence

Artificial intelligence makes consumers more impatient

April 11, 2026
Weird disconnect between gender stereotypes and leader preferences revealed by new psychology research
Business

When the pay gap is wide, women see professional beauty as a strategic asset

April 11, 2026
Women with sexual trauma histories more likely to engage in “Duty Sex”
Relationships and Sexual Health

New psychology research explains why some women devalue their own orgasms

April 10, 2026
Scientists identify a fat-derived hormone that drives the mood benefits of exercise
Artificial Intelligence

People consistently devalue creative writing generated by artificial intelligence

April 5, 2026

STAY CONNECTED

RSS Psychology of Selling

  • Why personalized ads sometimes backfire: A research review explains when tailoring messages works and when it doesn’t
  • The common advice to avoid high customer expectations may not be backed by evidence
  • Personality-matched persuasion works better, but mismatched messages can backfire
  • When happy customers and happy employees don’t add up: How investor signals have shifted in the social media age
  • Correcting fake news about brands does not backfire, five-study experiment finds

LATEST

Childhood trauma and attachment styles show nuanced links to alternative sexual preferences

New study reveals how political bias conditions the impact of conspiracy thinking

Cognition might emerge from embodied “grip” with the world rather than abstract mental processes

Men and women show different relative cognitive strengths across their lifespans

Early exposure to forever chemicals linked to altered brain genes and impulsive behavior in rats

Soft brain implants outperform rigid silicon in long-term safety study

Disclosing autism to AI chatbots prompts overly cautious, stereotypical advice

Can choking during sex cause brain damage? Emerging evidence points to hidden neurological risks

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc