PsyPost
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Neuroscience
  • About
No Result
View All Result
Join
My Account
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

AI vision: GPT-4V shows human-like ability to interpret social scenes, study finds

by Eric W. Dolan
September 5, 2025
Reading Time: 4 mins read
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

A new study published in Imaging Neuroscience has found that large language models with visual processing abilities, such as GPT-4V, can evaluate and describe social interactions in images and short videos in a way that closely matches human perception. The research suggests that artificial intelligence can not only identify individual social cues, but also capture the underlying structure of how humans perceive social information.

Large language models (LLMs) are advanced machine learning systems that can generate human-like responses to text inputs. Over the past few years, LLMs have become capable of passing professional exams, emulating personality traits, and simulating theory of mind. More recently, models such as GPT-4V have gained the ability to process visual inputs, making it possible for them to “see” and describe scenes, objects, and people.

This leap in visual capability opens new possibilities for psychological research. Human social perception depends heavily on our ability to make quick inferences from visual input—interpreting facial expressions, body posture, and interactions between people.

If AI models can match or approximate these human judgments, they may offer scalable tools for behavioral science and cognitive neuroscience. But the key question remains: How well can AI interpret the nuanced, often ambiguous social signals that humans rely on?

To explore this question, researchers at the University of Turku used OpenAI’s GPT-4V to evaluate a set of 468 static images and 234 short video clips, all depicting scenes with rich social content drawn from Hollywood films. The goal was to see whether GPT-4V could detect the presence of 138 different social features—ranging from concrete behaviors like “laughing” or “touching someone” to abstract traits like “dominant” or “empathetic.”

These same images and videos had previously been annotated by a large group of human participants. In total, over 2,200 individuals contributed more than 980,000 perceptual judgments using a sliding scale from “not at all” to “very much” to rate each feature. The human evaluations were used as a reference point to assess how closely GPT-4V’s ratings aligned with the consensus of real observers.

For each image or video, the researchers prompted GPT-4V to generate numerical ratings for the full set of social features. They repeated this process five times to account for the model’s variability, then averaged the results. In the case of video clips, since GPT-4V cannot yet directly process motion, the researchers extracted eight representative frames and added the transcribed dialogue from the clip.

The results showed a high level of agreement between GPT-4V and human observers. The correlation between AI and human ratings was 0.79 for both images and videos—a level that approaches the reliability seen between individual human participants. In fact, GPT-4V outperformed single human raters for 95% of the social features in images and 85% in videos.

Google News Preferences Add PsyPost to your preferred sources

However, GPT-4V’s ratings did not always match group-level consensus. When compared to the average of five human raters, the AI’s agreement was slightly lower, particularly for video clips. This suggests that while GPT-4V provides a strong approximation of human perception, its reliability may not yet match the collective judgment of multiple human observers working together.

The study also examined whether GPT-4V captured the deeper structure of how humans organize social information. Using statistical techniques such as principal coordinate analysis, the researchers found that the dimensions GPT-4V used to represent the social world—such as dominant vs. empathetic or playful vs. sexual—were strikingly similar to those found in human data.

This suggests that the model is not only mimicking surface-level judgments but may be tapping into similar patterns of representation that humans use to make sense of social interactions.

To take the comparison one step further, the researchers used GPT-4V’s social feature annotations as predictors in a functional MRI (fMRI) study. Ninety-seven participants had previously watched a medley of 96 short, socially rich video clips while undergoing brain scans. By linking the social features present in each video to patterns of brain activity, the researchers could map which areas of the brain respond to which types of social information.

Remarkably, GPT-4V-based stimulus models produced nearly identical brain activation maps as those generated using human annotations. The correlation between the two sets of maps was extremely high (r = 0.95), and both identified a similar network of regions—such as the superior temporal sulcus, temporoparietal junction, and fusiform gyrus—as being involved in processing social cues.

This finding provides evidence that GPT-4V’s judgments can be used to model how the brain perceives and organizes social information. It also suggests that AI models could assist in designing and interpreting future neuroimaging experiments, especially in cases where manual annotation would be time-consuming or expensive.

These findings open several possible directions for future research and real-world applications. In neuroscience, LLMs like GPT-4V could help generate high-dimensional annotations of complex stimuli, allowing researchers to reanalyze existing brain data or design new experiments with greater precision. In behavioral science, AI could serve as a scalable tool for labeling emotional and social content in large datasets.

Outside the lab, this technology could support mental health care, by identifying signs of distress in patient interactions, or improve customer service by analyzing emotional cues in video calls. It could also be used in surveillance systems to detect potential conflicts or identify unusual social behaviors in real-time settings.

At the same time, the study’s authors caution that these models are not perfect replacements for human judgment. GPT-4V performed worse on some social features that involve more subjective or ambiguous judgments, such as “ignoring someone” or “harassing someone.” These types of evaluations may require contextual understanding that AI systems still lack, or may be influenced by training data biases or content moderation filters.

The model also tended to rate low-level features more conservatively than humans—possibly due to its probabilistic nature or its safeguards against generating controversial outputs. In some cases, the AI refused to evaluate scenes containing sexual or violent content, highlighting the constraints imposed by platform-level safety policies.

While the results are promising, some limitations should be noted. The AI ratings were compared against a relatively small number of human raters per stimulus, and larger datasets could provide a more robust benchmark. The model was also tested on short, scripted film clips rather than real-world or live interactions, so its performance in more natural settings remains an open question.

Future work could explore whether tailoring LLMs to specific demographic perspectives improves their alignment with particular groups. Researchers might also investigate how AI models form these judgments—what internal processes or representations they use—and whether these resemble the mechanisms underlying human social cognition.

The study, “GPT-4V shows human-like social perceptual capabilities at phenomenological and neural levels,” was authored by Severi Santavirta, Yuhang Wu, Lauri Suominen, and Lauri Nummenmaa.

RELATED

People cannot tell AI-generated from human-written poetry and they like AI poetry more
Artificial Intelligence

Fascinating new research suggests artificial neurodivergence could help solve the AI alignment problem

May 1, 2026
Gold digging is strongly linked to psychopathy and dark personality traits, study finds
Artificial Intelligence

High trust in AI leaves individuals vulnerable to “cognitive surrender,” study finds

April 30, 2026
Artificial intelligence flatters users into bad behavior
Artificial Intelligence

Artificial intelligence flatters users into bad behavior

April 26, 2026
Psychology textbooks still misrepresent famous experiments and controversial debates
Artificial Intelligence

How eye contact shapes the believability of computer-generated faces

April 24, 2026
Facebook users who ruminate and compare themselves to their friends experience increased loneliness
Artificial Intelligence

Women perceive AI as riskier than men do, study finds

April 22, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

Psychologists pinpoint the conversational mechanisms that help humans bond with AI

April 22, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

Unrestricted generative AI harms high school math learning by acting as a crutch

April 21, 2026
Live music causes brain waves to synchronize more strongly with rhythm than recorded music
Artificial Intelligence

People remain “blissfully ignorant” of AI use in everyday messages, new research shows

April 20, 2026

Follow PsyPost

The latest research, however you prefer to read it.

Daily newsletter

One email a day. The newest research, nothing else.

Google News

Get PsyPost stories in your Google News feed.

Add PsyPost to Google News
RSS feed

Use your favorite reader. We also syndicate to Apple News.

Copy RSS URL
Social media
Support independent science journalism

Ad-free reading, full archives, and weekly deep dives for members.

Become a member

Trending

  • The gender friendship gap is driven primarily by white men, not a universal difference across groups
  • General intelligence explains the link between math and music skills
  • New study reveals a striking gap between sexual pleasure and overall satisfaction in the U.S.
  • Fascinating new research suggests artificial neurodivergence could help solve the AI alignment problem
  • Childhood trauma linked to biological aging and gaze avoidance

Psychology of Selling

  • Relying on financial bonuses might actually be driving your sales team away, new research suggests
  • Why the most emotionally skilled salespeople still underperform without one key ingredient
  • Why cramped spaces sometimes make customers happier: The surprising science of “spatial captivity”
  • Seven seller skills that drive B2B sales performance, according to a Norwegian study
  • What makes customers stick with a salesperson? A study traces the path from trust to long-term commitment

PsyPost is a psychology and neuroscience news website dedicated to reporting the latest research on human behavior, cognition, and society. (READ MORE...)

  • Mental Health
  • Neuroimaging
  • Personality Psychology
  • Social Psychology
  • Artificial Intelligence
  • Cognitive Science
  • Psychopharmacology
  • Contact us
  • Disclaimer
  • Privacy policy
  • Terms and conditions
  • Do not sell my personal information

(c) PsyPost Media Inc

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy

(c) PsyPost Media Inc