Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

ChatGPT’s social trait judgments align with human impressions, study finds

by Eric W. Dolan
November 13, 2025
in Artificial Intelligence, Attractiveness
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

A new study published in Computers in Human Behavior: Artificial Humans provides evidence that ChatGPT’s judgments of facial traits such as attractiveness, dominance, and trustworthiness tend to align with those made by humans. Across multiple experiments, researcher Robin S.S. Kramer of the University of Lincoln found that the AI’s evaluations of faces generally reflected average human opinions. ChatGPT was prone to the same “attractiveness halo effect” seen in human judgments.

First impressions formed from faces are known to influence how people treat others. Traits such as trustworthiness, dominance, and attractiveness are rapidly inferred from facial features and can affect real-life decisions, including hiring and criminal sentencing. While these judgments are shaped by personal experience and culture, past research has shown that there is often a surprising degree of agreement across individuals.

ChatGPT, although not originally designed for visual analysis, now includes functionality that allows it to interpret images by converting them into text-like representations. Because it was trained on large amounts of human-created image-text pairs, it is plausible that it may have developed internal associations between facial features and social traits.

“Since the release of ChatGPT, I’ve been really interested in the capabilities of this new wave of AI tools. Once users were able to upload images to it and interrogate what ChatGPT could ‘see,’ I was fascinated to understand its perceptions of face photos,” explained Kramer, a senior lecturer at the University of Lincoln.

“Since the chatbot has been trained on a vast amount of images and text from the internet (presumably including lots of faces), it was logical to predict that its judgements would, at least to some extent, align with our own. Even so, this needed to be tested rather than assumed.”

To evaluate ChatGPT’s ability to interpret social traits from faces, Kramer conducted a series of studies using a well-established set of face photographs from the Chicago Face Database. This database contains images of people with neutral expressions and is accompanied by human ratings on traits like attractiveness, dominance, and trustworthiness. Importantly, the image files themselves are not publicly accessible online, making it unlikely that these specific faces were part of ChatGPT’s training data.

In the first study, the researcher paired faces that had been rated by humans as either very high or very low in one of the three social traits. ChatGPT was shown these pairs and asked to choose which person looked more attractive, dominant, or trustworthy. Across all 360 image pairs, the chatbot’s choices agreed with human ratings more than 85% of the time. Agreement was especially high for attractiveness judgments, where the AI selected the human-rated “high” model nearly every time.

However, some variation existed across traits and demographic groups. Agreement was slightly lower for trustworthiness and dominance, especially when the human-rated difference between paired faces was small. This suggests that ChatGPT’s judgments may be more reliable when the contrast is clearer, which mirrors human perception.

Google News Preferences Add PsyPost to your preferred sources

In a follow-up experiment, the researcher examined how ChatGPT’s ratings of individual faces compared with those of human participants. In this case, both ChatGPT and 63 human participants rated the same 40 White faces for attractiveness on a 1–7 scale. ChatGPT repeated the task twice to assess its consistency.

ChatGPT’s ratings showed a moderate correlation with average human judgments (around 0.52) and were somewhat less aligned with individual raters (around 0.36 on average). Its internal consistency, measured by how similar its first and second ratings were, was fairly strong at 0.64—close to the human average of 0.74 for test-retest reliability.

These findings suggest that ChatGPT behaves similarly to an average human rater: not identical to any one person, but generally in line with group-level judgments. Moreover, the variation in its responses across sessions mirrors the variability seen in human behavior, although this is partially due to the way ChatGPT generates outputs with some randomness.

“ChatGPT’s perceptions of human faces aligns with our own judgements,” Kramer told PsyPost. “In other words, faces that we see as attractive or trustworthy, for instance, are also ‘seen’ that way by the chatbot.”

Given longstanding concerns about racial bias in AI systems, another goal of the research was to assess whether ChatGPT showed any preference for one racial group over another when evaluating faces. To do this, the researcher created pairs of images where the most highly rated faces from one racial group were matched against the least highly rated faces from another group. If ChatGPT had consistently favored a particular race, even when human ratings suggested otherwise, it would have indicated a potential bias.

However, ChatGPT chose the higher-rated model in 58 out of 60 such comparisons. This pattern was consistent across traits and genders, suggesting that the chatbot’s judgments were not strongly influenced by race, at least in these clearly defined cases.

That said, this method could only detect overt bias. It could not capture more subtle forms, such as associations with skin tone or facial features that may influence perception in less obvious ways. As a result, the absence of explicit racial bias in this study does not rule out the possibility of more nuanced biases, which the author notes as a direction for future research.

The study also examined whether ChatGPT is prone to the same type of “halo effect” often seen in human judgments, where individuals seen as attractive are also assumed to possess other positive traits. Using image pairs where ChatGPT had already judged one face as more attractive, the researcher asked the AI to evaluate which person looked more intelligent, sociable, or confident. In 92.5% of these comparisons, ChatGPT selected the more attractive face for at least one of these additional traits.

“While I suspected that humans and ChatGPT would agree in terms of which faces were more attractive, I was somewhat surprised by ChatGPT’s demonstrating a halo effect,” Kramer said. “The tool considered more attractive faces to also be more confident, intelligent, and sociable (as humans do), which was presumably due to information present but more implicit in the training data. I suppose that the text accompanying images online consistently labelled or described more attractive faces in these ways, resulting in the bias present here.”

The findings provide support for the idea that ChatGPT’s facial trait judgments resemble those of humans. But as with all research, there are some caveats.

“One caveat is that I utilized a constrained set of face images in my work,” Kramer noted. “All identities showed a neutral expression, were forward-facing, and wore a grey t-shirt in front of a white background. As such, I avoided the possible influence of variation that comes with unconstrained, real-world images. This was an initial investigation and I think it would be really interesting to explore how ChatGPT might be affected by things like facial expressions, clothing color, and so on, and whether these influences mirror how humans are affected by such changes.”

Another limitation is the challenge of measuring alignment between AI and human judgments. Because people often disagree with each other, it can be difficult to establish a clear benchmark for comparison. The researcher addressed this by using both average human ratings and analyses of consistency, but further work is needed to refine these methods.

“While agreement between humans and ChatGPT seemed to be fairly large, I focussed here on relatively clear cut comparisons (e.g., pairing two faces that were rated highest and lowest for a particular trait like attractiveness),” Kramer explained. “As such, further work might explore more nuanced judgements to better quantify this agreement. Problematically, of course, humans don’t even perfectly agree with each other when making such judgements, so any approach also needs to take this issue into account.”

Additionally, because ChatGPT’s responses are non-deterministic, identical prompts can lead to slightly different outputs. This randomness is intended to make interactions feel more natural, but it complicates efforts to measure reliability. Even so, the study found that ChatGPT’s repeated judgments were fairly stable and similar to patterns seen in human test-retest data.

In future research, Kramer plans to explore how ChatGPT’s internal models of facial traits might influence the images it generates. Since ChatGPT is now capable of producing synthetic faces, it will be important to see whether its concept of an “attractive” or “trustworthy” face shapes how those faces appear. This could have implications for how AI-generated content is interpreted and used in real-world applications.

“One thing I hadn’t realized until after completing this research was that uploading images to ChatGPT (and other such tools) can be considered sharing with a third party, which may go against the terms of use for some image sets,” Kramer added. “As such, I urge researchers, and indeed any users of AI tools, to make sure that they are aware of what can and cannot be uploaded when interacting with these chatbots.”

The study, “Comparing ChatGPT with human judgements of social traits from face photographs,” was authored by Robin S.S. Kramer.

Previous Post

Anxiety disorders linked to lower levels of key nutrient

Next Post

New study explores how ADHD symptoms relate to menopause in midlife women

RELATED

Employees who feel attractive are more likely to share ideas at work
Attractiveness

Employees who feel attractive are more likely to share ideas at work

March 6, 2026
Why most people fail to spot AI-generated faces, while super-recognizers have a subtle advantage
Artificial Intelligence

Why most people fail to spot AI-generated faces, while super-recognizers have a subtle advantage

February 28, 2026
People with social anxiety more likely to become overdependent on conversational artificial intelligence agents
Artificial Intelligence

AI therapy is rated higher for empathy until people learn a machine wrote the text

February 26, 2026
New research: AI models tend to reflect the political ideologies of their creators
Artificial Intelligence

New research: AI models tend to reflect the political ideologies of their creators

February 26, 2026
Being adopted doesn’t change how teens handle love and dating
Attractiveness

Early physical attractiveness predicts a more socially effective personality in adulthood

February 25, 2026
Stress disrupts gut and brain barriers by reducing key microbial metabolites, study finds
Artificial Intelligence

AI and mental health: New research links use of ChatGPT to worsened psychiatric symptoms

February 24, 2026
Stanford scientist discovers that AI has developed an uncanny human-like ability
Artificial Intelligence

How personality and culture relate to our perceptions of artificial intelligence

February 23, 2026
Young children are more likely to trust information from robots over humans
Artificial Intelligence

The presence of robot eyes affects perception of mind

February 21, 2026

STAY CONNECTED

LATEST

Apocalyptic views are surprisingly common among Americans and predict responses to existential hazards

A psychological need for certainty is associated with radical right voting

Blocking a common brain gas reverses autism-like traits in mice

New psychology research sheds light on why empathetic people end up with toxic partners

Cognitive deficits underlying ADHD do not explain the link with problematic social media use

Scientists identify brain regions associated with auditory hallucinations in borderline personality disorder

People with the least political knowledge tend to be the most overconfident in their grasp of facts

How the wording of a trigger warning changes our psychological response

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc