Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Scientists shocked to find AI’s social desirability bias “exceeds typical human standards”

by Eric W. Dolan
February 5, 2025
in Artificial Intelligence
(Photo credit: Adobe Stock)

(Photo credit: Adobe Stock)

Share on TwitterShare on Facebook
Follow PsyPost on Google News

A new study published in PNAS Nexus reveals that large language models, which are advanced artificial intelligence systems, demonstrate a tendency to present themselves in a favorable light when taking personality tests. This “social desirability bias” leads these models to score higher on traits generally seen as positive, such as extraversion and conscientiousness, and lower on traits often viewed negatively, like neuroticism.

The language systems seem to “know” when they are being tested and then try to look better than they might otherwise appear. This bias is consistent across various models, including GPT-4, Claude 3, Llama 3, and PaLM-2, with more recent and larger models showing an even stronger inclination towards socially desirable responses.

Large language models are increasingly used to simulate human behavior in research settings. They offer a potentially cost-effective and efficient way to collect data that would otherwise require human participants. Since these models are trained on vast amounts of text data generated by humans, they can often mimic human language and behavior with surprising accuracy. Understanding the potential biases of large language models is therefore important for researchers who are using or planning to use them in their studies.

Personality traits, particularly the “Big Five” (extraversion, openness to experience, conscientiousness, agreeableness, and neuroticism), are a common focus of psychological research. While the Big Five model was designed to be neutral, most people tend to favor higher scores on extraversion, openness, conscientiousness, and agreeableness, and lower scores on neuroticism.

Given the prevalence of personality research and the potential for large language models to be used in this field, the researchers sought to determine whether these models exhibit biases when completing personality tests. Specifically, they wanted to investigate whether large language models are susceptible to social desirability bias, a well-documented phenomenon in human psychology where individuals tend to answer questions in a way that portrays them positively.

“Our lab works at the intersection of psychology and AI,” said study authors Johannes Eichstaedt (an assistant professor and Shriram Faculty Fellow at the Institute for Human-Centered Artificial Intelligence) and Aadesh Salecha (a master’s student at Stanford University and a staff data scientist at the Computational Psychology and Well-Being Lab).

“We’ve been fascinated by using our understanding of human behavior (and the methods from cognitive science) and applying it to intelligent machines. As LLMs are used more and more to simulate human behavior in psychological experiments, we wanted to explore whether they reflect biases similar to those we see in humans. During our explorations with giving different psychological tests to LLMs, we came across this robust social desirability bias.”

To examine potential response biases in large language models, the researchers conducted a series of experiments using a standardized 100-item Big Five personality questionnaire. This questionnaire is based on a well-established model of personality and is widely used in psychological research. The researchers administered the questionnaire to a variety of large language models, including those developed by OpenAI, Anthropic, Google, and Meta. These models were chosen to ensure that the findings would be broadly applicable across different types of large language models.

The core of the study involved varying the number of questions presented to the models in each “batch.” The researchers tested batches ranging from a single question to 20 questions at a time. Each batch was presented in a new “session” to prevent the model from having access to previous questions and answers. The models were instructed to respond to each question using a 5-point scale, ranging from “Very Inaccurate” to “Very Accurate,” similar to how humans would complete the questionnaire.

The researchers also took steps to ensure the integrity of their findings. They tested the impact of randomness in the models’ responses by adjusting a setting called “temperature,” which controls the level of randomness. They created paraphrased versions of the survey questions to rule out the possibility that the models were simply recalling memorized responses from their training data.

Additionally, they randomized the order of the questions to eliminate any potential effects of question order. Finally, they tested both positively coded and reverse-coded versions of the questions (e.g., “I am the life of the party” vs. “I don’t talk a lot”) to assess the potential influence of acquiescence bias, which is the tendency to agree with statements regardless of their content.

The study’s results clearly demonstrated that large language models exhibit a social desirability bias when completing the Big Five personality test. Across all tested models, scores were skewed towards the desirable ends of the trait dimensions. For instance, as the number of questions presented in a batch increased, the models’ scores on extraversion, conscientiousness, openness, and agreeableness tended to increase, while their scores on neuroticism decreased. This pattern was particularly pronounced in the GPT-4 model.

“Our study shows that LLMs behave differently when they infer that they are being evaluated,” Eichstaedt and Salecha told PsyPost. “Specifically, when they recognize that they are being subjected to a Big Five personality test they ‘try’ to appear more likable, just like people do. This has big implications for how we use LLMs in research and real-world applications because it suggests they might subtly change their behavior based on how they interpret a situation. It also shows just how much they are trained to ‘be liked’ — which is something that can backfire in subtle ways — some models are leaning towards being sycophantic.”

The researchers found that the difference in scores between administering the survey one question at a time versus 20 questions at a time was equivalent to more than one standard deviation in human terms, which is a very large effect. In other words, the change in the model’s responses due to social desirability bias was comparable to a significant shift in a person’s personality. The bias was not limited to GPT-4; it was observed across all the large language models tested, with larger and more recent models generally exhibiting a stronger bias.

“The size of the effect was shocking. In humans, social desirability bias exists but in LLMs, we saw effect sizes that exceed typical human standards — like an average person suddenly becoming extremely extroverted (90th percentile),” the researchers said.

Further analysis suggested that the models were able to infer when they were being evaluated for personality, even with a small number of questions. The accuracy with which the models could identify that they were taking a personality test was associated with the strength of the social desirability bias. For example, GPT-4, Claude 3, and Llama 3 could correctly identify the nature of the questions with over 90% accuracy when presented with just five questions, while PaLM-2 and GPT-3.5 were less accurate.

When the models were explicitly told that they were completing a personality test, their responses were even more skewed towards social desirability, even when presented with only a single question. This finding suggests that the models are adjusting their responses based on their perception of being evaluated. While reverse-coding the questions reduced the magnitude of the bias, it did not eliminate it entirely. This indicates that the observed effects are not solely due to acquiescence bias. The researchers also confirmed that the bias persisted even when the questions were paraphrased and when the order of questions was randomized, further supporting the robustness of their findings.

The researchers acknowledge that their study primarily focused on the Big Five personality traits, which are widely represented in the training data of large language models. It is possible that the same response biases might not occur with less common or less socially evaluative psychological constructs.

Future research should explore the prevalence of social desirability bias across different types of surveys and measurement methods. Another area for further investigation is the role of training data and model development processes in the emergence of these biases. Understanding how these biases are formed and whether they can be mitigated during the training process is essential for ensuring the responsible use of large language models in research and other applications.

Despite these limitations, the study’s findings have significant implications for the use of large language models as proxies for human participants in research. The presence of social desirability bias suggests that results obtained from these models may not always accurately reflect human responses, particularly in the context of personality assessment and other socially sensitive topics.

“As we integrate AI into more parts of our lives, understanding these subtle behaviors and biases becomes crucial,” Eichstaedt and Salecha said. “There needs to be more research into understanding at which stage of the LLM development (pre-training, preference tuning, etc) these biases are being amplified and how to mitigate them without hampering the performance of these models. Whether we’re using LLMs to support research, write content, or even assist in mental health settings, we need to be aware of how these models might unconsciously mimic human flaws—and how that might affect outcomes.”

The study, “Large language models display human-like social desirability biases in Big Five personality surveys,” was authored by Aadesh Salecha, Molly E. Ireland, Shashanka Subrahmanya, João Sedoc, Lyle H Ungar, and Johannes C. Eichstaedt.

RELATED

Liberals prefer brands that give employees more freedom, study finds
Artificial Intelligence

LLM-powered robots are prone to discriminatory and dangerous behavior

November 15, 2025
People who signal victimhood are seen as having more manipulative traits
Artificial Intelligence

Grok’s views mirror other top AI models despite “anti-woke” branding

November 14, 2025
ChatGPT’s social trait judgments align with human impressions, study finds
Artificial Intelligence

ChatGPT’s social trait judgments align with human impressions, study finds

November 13, 2025
From tango to StarCraft: Creative activities linked to slower brain aging, according to new neuroscience research
Artificial Intelligence

New study finds users are marrying and having virtual children with AI chatbots

November 11, 2025
AI-assisted venting can boost psychological well-being, study suggests
Artificial Intelligence

Study finds users disclose more to AI chatbots introduced as human

November 11, 2025
Mind captioning: This scientist just used AI to translate brain activity into text
Artificial Intelligence

Artificial intelligence exhibits human-like cognitive errors in medical reasoning

November 10, 2025
Mind captioning: This scientist just used AI to translate brain activity into text
Artificial Intelligence

Mind captioning: This scientist just used AI to translate brain activity into text

November 10, 2025
Shyness linked to spontaneous activity in the brain’s cerebellum
Artificial Intelligence

AI roots out three key predictors of terrorism support

November 6, 2025

PsyPost Merch

STAY CONNECTED

LATEST

Specific parental traits are linked to distinct cognitive skills in gifted children

The rhythm of your speech may offer clues to your cognitive health

A simple writing exercise shows promise for reducing anxiety

Different types of narcissism are linked to distinct sexual fantasies

Why are some people less outraged by corporate misdeeds?

Study uncovers distinct genetic blueprints for early- and late-onset depression

How you view time may influence depression by shaping your sleep rhythm

ADHD is linked to early and stable differences in brain’s limbic system

RSS Psychology of Selling

  • What the neuroscience of Rock-Paper-Scissors reveals about winning and losing
  • Rethink your global strategy: Research reveals when to lead with the heart or the head
  • What five studies reveal about Black Friday misbehavior
  • How personal happiness shapes workplace flourishing among retail salespeople
  • Are sales won by skill or flexibility? A look inside investment banking sales strategies
         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy