Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

AI vision: GPT-4V shows human-like ability to interpret social scenes, study finds

by Eric W. Dolan
September 5, 2025
in Artificial Intelligence
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook
Stay informed on the latest psychology and neuroscience research—follow PsyPost on LinkedIn for daily updates and insights.

A new study published in Imaging Neuroscience has found that large language models with visual processing abilities, such as GPT-4V, can evaluate and describe social interactions in images and short videos in a way that closely matches human perception. The research suggests that artificial intelligence can not only identify individual social cues, but also capture the underlying structure of how humans perceive social information.

Large language models (LLMs) are advanced machine learning systems that can generate human-like responses to text inputs. Over the past few years, LLMs have become capable of passing professional exams, emulating personality traits, and simulating theory of mind. More recently, models such as GPT-4V have gained the ability to process visual inputs, making it possible for them to “see” and describe scenes, objects, and people.

This leap in visual capability opens new possibilities for psychological research. Human social perception depends heavily on our ability to make quick inferences from visual input—interpreting facial expressions, body posture, and interactions between people.

If AI models can match or approximate these human judgments, they may offer scalable tools for behavioral science and cognitive neuroscience. But the key question remains: How well can AI interpret the nuanced, often ambiguous social signals that humans rely on?

To explore this question, researchers at the University of Turku used OpenAI’s GPT-4V to evaluate a set of 468 static images and 234 short video clips, all depicting scenes with rich social content drawn from Hollywood films. The goal was to see whether GPT-4V could detect the presence of 138 different social features—ranging from concrete behaviors like “laughing” or “touching someone” to abstract traits like “dominant” or “empathetic.”

These same images and videos had previously been annotated by a large group of human participants. In total, over 2,200 individuals contributed more than 980,000 perceptual judgments using a sliding scale from “not at all” to “very much” to rate each feature. The human evaluations were used as a reference point to assess how closely GPT-4V’s ratings aligned with the consensus of real observers.

For each image or video, the researchers prompted GPT-4V to generate numerical ratings for the full set of social features. They repeated this process five times to account for the model’s variability, then averaged the results. In the case of video clips, since GPT-4V cannot yet directly process motion, the researchers extracted eight representative frames and added the transcribed dialogue from the clip.

The results showed a high level of agreement between GPT-4V and human observers. The correlation between AI and human ratings was 0.79 for both images and videos—a level that approaches the reliability seen between individual human participants. In fact, GPT-4V outperformed single human raters for 95% of the social features in images and 85% in videos.

However, GPT-4V’s ratings did not always match group-level consensus. When compared to the average of five human raters, the AI’s agreement was slightly lower, particularly for video clips. This suggests that while GPT-4V provides a strong approximation of human perception, its reliability may not yet match the collective judgment of multiple human observers working together.

The study also examined whether GPT-4V captured the deeper structure of how humans organize social information. Using statistical techniques such as principal coordinate analysis, the researchers found that the dimensions GPT-4V used to represent the social world—such as dominant vs. empathetic or playful vs. sexual—were strikingly similar to those found in human data.

This suggests that the model is not only mimicking surface-level judgments but may be tapping into similar patterns of representation that humans use to make sense of social interactions.

To take the comparison one step further, the researchers used GPT-4V’s social feature annotations as predictors in a functional MRI (fMRI) study. Ninety-seven participants had previously watched a medley of 96 short, socially rich video clips while undergoing brain scans. By linking the social features present in each video to patterns of brain activity, the researchers could map which areas of the brain respond to which types of social information.

Remarkably, GPT-4V-based stimulus models produced nearly identical brain activation maps as those generated using human annotations. The correlation between the two sets of maps was extremely high (r = 0.95), and both identified a similar network of regions—such as the superior temporal sulcus, temporoparietal junction, and fusiform gyrus—as being involved in processing social cues.

This finding provides evidence that GPT-4V’s judgments can be used to model how the brain perceives and organizes social information. It also suggests that AI models could assist in designing and interpreting future neuroimaging experiments, especially in cases where manual annotation would be time-consuming or expensive.

These findings open several possible directions for future research and real-world applications. In neuroscience, LLMs like GPT-4V could help generate high-dimensional annotations of complex stimuli, allowing researchers to reanalyze existing brain data or design new experiments with greater precision. In behavioral science, AI could serve as a scalable tool for labeling emotional and social content in large datasets.

Outside the lab, this technology could support mental health care, by identifying signs of distress in patient interactions, or improve customer service by analyzing emotional cues in video calls. It could also be used in surveillance systems to detect potential conflicts or identify unusual social behaviors in real-time settings.

At the same time, the study’s authors caution that these models are not perfect replacements for human judgment. GPT-4V performed worse on some social features that involve more subjective or ambiguous judgments, such as “ignoring someone” or “harassing someone.” These types of evaluations may require contextual understanding that AI systems still lack, or may be influenced by training data biases or content moderation filters.

The model also tended to rate low-level features more conservatively than humans—possibly due to its probabilistic nature or its safeguards against generating controversial outputs. In some cases, the AI refused to evaluate scenes containing sexual or violent content, highlighting the constraints imposed by platform-level safety policies.

While the results are promising, some limitations should be noted. The AI ratings were compared against a relatively small number of human raters per stimulus, and larger datasets could provide a more robust benchmark. The model was also tested on short, scripted film clips rather than real-world or live interactions, so its performance in more natural settings remains an open question.

Future work could explore whether tailoring LLMs to specific demographic perspectives improves their alignment with particular groups. Researchers might also investigate how AI models form these judgments—what internal processes or representations they use—and whether these resemble the mechanisms underlying human social cognition.

The study, “GPT-4V shows human-like social perceptual capabilities at phenomenological and neural levels,” was authored by Severi Santavirta, Yuhang Wu, Lauri Suominen, and Lauri Nummenmaa.

RELATED

People cannot tell AI-generated from human-written poetry and they like AI poetry more
Artificial Intelligence

Artificial intelligence loses out to humans in credibility during corporate crisis responses

September 2, 2025
Neuroscientists just rewrote our understanding of psychedelics with a groundbreaking receptor-mapping study
Artificial Intelligence

AI-powered brain stimulation system can boost focus for those who struggle with attention

September 1, 2025
Scientists shocked to find a supposedly harmless virus hiding in brains of Parkinson’s patients
Artificial Intelligence

Despite the hype, generative AI hasn’t outshined humans in creative idea generation

August 30, 2025
Scientists shocked to find a supposedly harmless virus hiding in brains of Parkinson’s patients
Artificial Intelligence

Romantic AI use is surprisingly common and linked to poorer mental health, study finds

August 30, 2025
Study links phubbing sensitivity to attachment patterns in romantic couples
Artificial Intelligence

A major new study charts the surprising ways people weigh AI’s risks and benefits

August 28, 2025
Too much ChatGPT? Study ties AI reliance to lower grades and motivation
Artificial Intelligence

Is ChatGPT making us stupid?

August 25, 2025
People cannot tell AI-generated from human-written poetry and they like AI poetry more
Artificial Intelligence

Top AI models fail spectacularly when faced with slightly altered medical questions

August 24, 2025
Smash or pass? AI could soon predict your date’s interest via physiological cues
Artificial Intelligence

Researchers fed 7.9 million speeches into AI—and what they found upends our understanding of language

August 23, 2025

STAY CONNECTED

LATEST

AI vision: GPT-4V shows human-like ability to interpret social scenes, study finds

Antisocial personality traits linked to blunted brain responses to angry faces

Vaccines hold tantalizing promise in the fight against dementia

Overconfidence in bullshit detection linked to cognitive blind spots and narcissistic traits

Conventional values correlate with particular “dark-side” personality traits

Choral singing decreases the risk of developing depression and anxiety in older adults

A trace mineral may help guard the brain against Alzheimer’s, new study suggests

Army basic training appears to reshape how the brain processes reward

         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy