PsyPost
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
Join
My Account
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

AI chatbots outperform humans in evaluating social situations, study finds

by Eric W. Dolan
December 3, 2024
Reading Time: 4 mins read
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

Recent research published in Scientific Reports has found that certain advanced AI chatbots are more adept than humans at making judgments in challenging social situations. Using a well-established psychological tool known as a Situational Judgment Test, researchers found that three chatbots—Claude, Microsoft Copilot, and you.com’s smart assistant—outperformed human participants in selecting the most effective behavioral responses.

The ability of AI to assist in social interactions is becoming increasingly relevant, with applications ranging from customer service to mental health support. Large language models, such as the chatbots tested in this study, are designed to process language, understand context, and provide helpful responses. While previous studies have demonstrated their capabilities in academic reasoning and verbal tasks, their effectiveness in navigating complex social dynamics has remained underexplored.

Large language models are advanced artificial intelligence systems designed to understand and generate human-like text. These models are trained on vast amounts of data—books, articles, websites, and other textual sources—allowing them to learn patterns in language, context, and meaning.

This training enables these models to perform a variety of tasks, from answering questions and translating languages to composing essays and engaging in detailed conversations. Unlike earlier AI models, large language models rely on their ability to process context and generate responses that often feel conversational and relevant to the user’s input.

“As researchers, we are interested in the diagnostics of social competence and interpersonal skills,” said study author Justin M. Mittelstädt of the Institute of Aerospace Medicine.

“At the German Aerospace Center, we apply methods for diagnosing these skills, for example, to find suitable pilots and astronauts. As we are exploring new technologies for future human-machine interaction, we were curious to find out how the emerging large language models perform in these areas that are considered to be profoundly human.”

To evaluate AI performance, the researchers used a Situational Judgment Test, a tool widely used in psychology and personnel assessment to measure social competence. The test presented 12 scenarios requiring participants to evaluate four potential courses of action. For each scenario, participants were tasked with identifying the best and worst responses, as rated by a panel of 109 human experts.

The study compared the performance of five AI chatbots—Claude, Microsoft Copilot, ChatGPT, Google Gemini, and you.com’s smart assistant—with a sample of 276 human participants. These human participants were pilot applicants selected for their high educational qualifications and motivation. Their performance provided a rigorous benchmark for the AI systems.

Google News Preferences Add PsyPost to your preferred sources

Each chatbot completed the Situational Judgment Test ten times, with randomized presentation orders to ensure consistent results. The responses were then scored based on how well they aligned with the expert-identified best and worst options. In addition to choosing responses, the chatbots were asked to rate the effectiveness of each action in the scenarios, providing further data for comparison with expert evaluations.

The researchers found that all the tested AI chatbots performed at least as well as the human participants, with some outperforming them. Among the chatbots, Claude achieved the highest average score, followed by Microsoft Copilot and you.com’s smart assistant. These three systems consistently selected the most effective responses in the Situational Judgment Test scenarios, aligning closely with expert evaluations.

Interestingly, when chatbots failed to select the best response, they most often chose the second-most effective option, mirroring the decision-making patterns of human participants. This suggests that AI systems, while not perfect, are capable of nuanced judgment and probabilistic reasoning that closely resembles human thought processes.

“We have seen that these models are good at answering knowledge questions, writing code, solving logic problems, and the like,” Mittelstädt told PsyPost. “But we were surprised to find that some of the models were also, on average, better at judging the nuances of social situations than humans, even though they had not been explicitly trained for use in social settings. This showed us that social conventions and the way we interact as humans are encoded as readable patterns in the textual sources on which these models are trained.”

The study also highlighted differences in reliability among the AI systems. Claude showed the highest consistency across multiple test iterations, while Google Gemini exhibited occasional contradictions, such as rating an action as both the best and worst in different runs. Despite these inconsistencies, the overall performance of all tested AI systems surpassed expectations, demonstrating their potential to provide socially competent advice.

“Many people already use chatbots for a variety of everyday tasks,” Mittelstädt explained. “Our results suggest that chatbots may be quite good at giving advice on how to behave in tricky social situations and that people, especially those who are insecure in social interactions, may benefit from this. However, we do not recommend blindly trusting chatbots, as we also saw evidence of hallucinations and contradictory statements, as is often reported in the context of large language models.”

It is important to note that the study focused on simulated scenarios rather than real-world interactions, leaving questions about how AI systems might perform in dynamic, high-stakes social settings.

“To facilitate a quantifiable comparison between large language models and humans, we selected a multiple-choice test that demonstrates prognostic validity in humans for real-world behavior,” Mittelstädt noted. “However, performance on such a test does not yet guarantee that large language models will respond in a socially competent manner in real and more complex scenarios.”

Nevertheless, the findings suggest that AI systems are increasingly able to emulate human social judgment. These advancements open doors to practical applications, including personalized guidance in social and professional settings, as well as potential use in mental health support.

“Given the demonstrated ability of large language models to judge social situations effectively in a psychometric test, our objective is to assess their social competence in real-world interactions with people and the conditions under which people benefit from social advice provided by a large language model,” Mittelstädt told PsyPost.

“Furthermore, the response behavior in Situational Judgment Tests is highly culture-dependent. The effectiveness of a response in a specific situation may vary considerably from one culture to another. The good performance of large language models in our study demonstrates that they align closely with the judgments prevalent in Western cultures. It would be interesting to see how large language models perform in tests from other cultural contexts and whether their evaluation would change if they were trained with more data from a different culture.”

“Even though large language models may produce impressive performances in social tasks, they do not possess emotions, which would be a prerequisite for genuine social behavior,” Mittelstädt added. “We should keep in mind that large language models only imitate social responses that they have extracted from patterns in their training dataset. Despite this, there are promising applications, such as assisting individuals with social skills development.”

The study, “Large language models can outperform humans in social situational judgments,” was authored by Justin M. Mittelstädt, Julia Maier, Panja Goerke, Frank Zinn, and Michael Hermes.

RELATED

Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

New research reveals how humans judge the moral minds of artificial intelligence

May 30, 2026
Study links phubbing sensitivity to attachment patterns in romantic couples
Artificial Intelligence

Training AI chatbots to be warm and empathetic makes them less factually accurate

May 29, 2026
New Habsburg research reveals reproductive consequences of royal inbreeding
Artificial Intelligence

Machine learning uncovers how childhood trauma amplifies genetic risks for depression

May 27, 2026
People cannot tell AI-generated from human-written poetry and they like AI poetry more
Artificial Intelligence

A new study mapped 350,000 relationship stories and found a communication style AI struggles to copy

May 24, 2026
New study links manipulative personality traits to lower relationship intimacy expectations
Artificial Intelligence

Brain scans shed light on why women develop romantic feelings for AI companions

May 22, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
ADHD Research News

A new AI tool spots hidden signs of adult ADHD months before a formal diagnosis

May 21, 2026
Modern AI is often judged to be more human than actual humans in Turing test experiments
Artificial Intelligence

AI-generated Grokipedia articles are longer, less readable, and cite fewer sources than their Wikipedia counterparts

May 21, 2026
Modern AI is often judged to be more human than actual humans in Turing test experiments
Artificial Intelligence

Modern AI is often judged to be more human than actual humans in Turing test experiments

May 21, 2026

Follow PsyPost

The latest research, however you prefer to read it.

Daily newsletter

One email a day. The newest research, nothing else.

Google News

Get PsyPost stories in your Google News feed.

Add PsyPost to Google News
RSS feed

Use your favorite reader. We also syndicate to Apple News.

Copy RSS URL
Social media
Support independent science journalism

Ad-free reading, full archives, and weekly deep dives for members.

Become a member

Trending

  • More than half of adults with ADHD in clinical settings have a co-occurring personality disorder
  • New study links parental indulgence to psychopathic and narcissistic traits in adulthood
  • How learning to read alters the brain’s approach to spoken language
  • The psychology of paradoxical thinking: Extreme arguments in favor of a controversial topic can reduce overall support
  • Men’s sexual desire peaks around age 40, large new study finds

Science of Money

  • When your job feels scriptable: How routine work and AI anxiety drain employee energy
  • Childhood obesity and the American Dream: New research links early weight to lower lifetime mobility
  • The brain chemical behind your money moves: How dopamine shapes financial choices
  • Can AI read the room? How news sentiment signals which stocks will bounce back after a crash
  • New study finds private financial firms disproportionately promote upper-class white men

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc