Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

AI chatbots outperform humans in evaluating social situations, study finds

by Eric W. Dolan
December 3, 2024
in Artificial Intelligence
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

Recent research published in Scientific Reports has found that certain advanced AI chatbots are more adept than humans at making judgments in challenging social situations. Using a well-established psychological tool known as a Situational Judgment Test, researchers found that three chatbots—Claude, Microsoft Copilot, and you.com’s smart assistant—outperformed human participants in selecting the most effective behavioral responses.

The ability of AI to assist in social interactions is becoming increasingly relevant, with applications ranging from customer service to mental health support. Large language models, such as the chatbots tested in this study, are designed to process language, understand context, and provide helpful responses. While previous studies have demonstrated their capabilities in academic reasoning and verbal tasks, their effectiveness in navigating complex social dynamics has remained underexplored.

Large language models are advanced artificial intelligence systems designed to understand and generate human-like text. These models are trained on vast amounts of data—books, articles, websites, and other textual sources—allowing them to learn patterns in language, context, and meaning.

This training enables these models to perform a variety of tasks, from answering questions and translating languages to composing essays and engaging in detailed conversations. Unlike earlier AI models, large language models rely on their ability to process context and generate responses that often feel conversational and relevant to the user’s input.

“As researchers, we are interested in the diagnostics of social competence and interpersonal skills,” said study author Justin M. Mittelstädt of the Institute of Aerospace Medicine.

“At the German Aerospace Center, we apply methods for diagnosing these skills, for example, to find suitable pilots and astronauts. As we are exploring new technologies for future human-machine interaction, we were curious to find out how the emerging large language models perform in these areas that are considered to be profoundly human.”

To evaluate AI performance, the researchers used a Situational Judgment Test, a tool widely used in psychology and personnel assessment to measure social competence. The test presented 12 scenarios requiring participants to evaluate four potential courses of action. For each scenario, participants were tasked with identifying the best and worst responses, as rated by a panel of 109 human experts.

The study compared the performance of five AI chatbots—Claude, Microsoft Copilot, ChatGPT, Google Gemini, and you.com’s smart assistant—with a sample of 276 human participants. These human participants were pilot applicants selected for their high educational qualifications and motivation. Their performance provided a rigorous benchmark for the AI systems.

Google News Preferences Add PsyPost to your preferred sources

Each chatbot completed the Situational Judgment Test ten times, with randomized presentation orders to ensure consistent results. The responses were then scored based on how well they aligned with the expert-identified best and worst options. In addition to choosing responses, the chatbots were asked to rate the effectiveness of each action in the scenarios, providing further data for comparison with expert evaluations.

The researchers found that all the tested AI chatbots performed at least as well as the human participants, with some outperforming them. Among the chatbots, Claude achieved the highest average score, followed by Microsoft Copilot and you.com’s smart assistant. These three systems consistently selected the most effective responses in the Situational Judgment Test scenarios, aligning closely with expert evaluations.

Interestingly, when chatbots failed to select the best response, they most often chose the second-most effective option, mirroring the decision-making patterns of human participants. This suggests that AI systems, while not perfect, are capable of nuanced judgment and probabilistic reasoning that closely resembles human thought processes.

“We have seen that these models are good at answering knowledge questions, writing code, solving logic problems, and the like,” Mittelstädt told PsyPost. “But we were surprised to find that some of the models were also, on average, better at judging the nuances of social situations than humans, even though they had not been explicitly trained for use in social settings. This showed us that social conventions and the way we interact as humans are encoded as readable patterns in the textual sources on which these models are trained.”

The study also highlighted differences in reliability among the AI systems. Claude showed the highest consistency across multiple test iterations, while Google Gemini exhibited occasional contradictions, such as rating an action as both the best and worst in different runs. Despite these inconsistencies, the overall performance of all tested AI systems surpassed expectations, demonstrating their potential to provide socially competent advice.

“Many people already use chatbots for a variety of everyday tasks,” Mittelstädt explained. “Our results suggest that chatbots may be quite good at giving advice on how to behave in tricky social situations and that people, especially those who are insecure in social interactions, may benefit from this. However, we do not recommend blindly trusting chatbots, as we also saw evidence of hallucinations and contradictory statements, as is often reported in the context of large language models.”

It is important to note that the study focused on simulated scenarios rather than real-world interactions, leaving questions about how AI systems might perform in dynamic, high-stakes social settings.

“To facilitate a quantifiable comparison between large language models and humans, we selected a multiple-choice test that demonstrates prognostic validity in humans for real-world behavior,” Mittelstädt noted. “However, performance on such a test does not yet guarantee that large language models will respond in a socially competent manner in real and more complex scenarios.”

Nevertheless, the findings suggest that AI systems are increasingly able to emulate human social judgment. These advancements open doors to practical applications, including personalized guidance in social and professional settings, as well as potential use in mental health support.

“Given the demonstrated ability of large language models to judge social situations effectively in a psychometric test, our objective is to assess their social competence in real-world interactions with people and the conditions under which people benefit from social advice provided by a large language model,” Mittelstädt told PsyPost.

“Furthermore, the response behavior in Situational Judgment Tests is highly culture-dependent. The effectiveness of a response in a specific situation may vary considerably from one culture to another. The good performance of large language models in our study demonstrates that they align closely with the judgments prevalent in Western cultures. It would be interesting to see how large language models perform in tests from other cultural contexts and whether their evaluation would change if they were trained with more data from a different culture.”

“Even though large language models may produce impressive performances in social tasks, they do not possess emotions, which would be a prerequisite for genuine social behavior,” Mittelstädt added. “We should keep in mind that large language models only imitate social responses that they have extracted from patterns in their training dataset. Despite this, there are promising applications, such as assisting individuals with social skills development.”

The study, “Large language models can outperform humans in social situational judgments,” was authored by Justin M. Mittelstädt, Julia Maier, Panja Goerke, Frank Zinn, and Michael Hermes.

Previous Post

Psilocybin’s mental health benefits may include improved sleep

Next Post

Do men find female genitalia attractive? Here’s what the research says

RELATED

People ascribe intentions and emotions to both human- and AI-made art, but still report stronger emotions for artworks made by humans
Artificial Intelligence

New research links personality traits to confidence in recognizing artificial intelligence deception

April 13, 2026
Scientists just found a novel way to uncover AI biases — and the results are unexpected
Artificial Intelligence

Artificial intelligence makes consumers more impatient

April 11, 2026
Scientists identify a fat-derived hormone that drives the mood benefits of exercise
Artificial Intelligence

People consistently devalue creative writing generated by artificial intelligence

April 5, 2026
People cannot tell AI-generated from human-written poetry and they like AI poetry more
Artificial Intelligence

Job seekers mask their emotions and act more analytical when evaluated by artificial intelligence

April 3, 2026
AI autocomplete suggestions covertly change how users think about important topics
Artificial Intelligence

AI autocomplete suggestions covertly change how users think about important topics

April 2, 2026
Study links phubbing sensitivity to attachment patterns in romantic couples
Artificial Intelligence

How generative artificial intelligence is upending theories of political persuasion

April 1, 2026
People with attachment anxiety are more vulnerable to problematic AI use
Artificial Intelligence

Relying on AI chatbots for historical facts can influence your political beliefs, new study shows

March 30, 2026
ChatGPT acts as a “cognitive crutch” that weakens memory, new research suggests
Artificial Intelligence

ChatGPT acts as a “cognitive crutch” that weakens memory, new research suggests

March 30, 2026

STAY CONNECTED

RSS Psychology of Selling

  • The common advice to avoid high customer expectations may not be backed by evidence
  • Personality-matched persuasion works better, but mismatched messages can backfire
  • When happy customers and happy employees don’t add up: How investor signals have shifted in the social media age
  • Correcting fake news about brands does not backfire, five-study experiment finds
  • Should your marketing tell a story or state the facts? A massive meta-analysis has answers

LATEST

Scientists wired up volunteers’ genitals and had them watch animals hump to test a long-held theory

New study sheds light on the mechanisms behind declining relationship satisfaction among new parents

A daily mindfulness habit can improve your memory for future plans

Sexualized dating profiles can sabotage long-term relationship prospects, study finds

Researchers find DMT provides longer-lasting antidepressant effects than S-ketamine in animal models

Online gaming might contribute to creativity, study finds

More time spent on social media is linked to a thinner cerebral cortex in young adolescents

These types of breakups tend to coincide with moving on more easily

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc