Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Social reasoning in AI traced to an extremely small set of parameters

by Karina Petrova
November 18, 2025
in Artificial Intelligence
[Adobe Stock]

[Adobe Stock]

Share on TwitterShare on Facebook

A new study reveals that the capacity for social reasoning in large language models, a trait similar to the human “theory of mind,” originates from an exceptionally small and specialized subset of the model’s internal parameters. Researchers found that these few parameters are deeply connected to the mechanisms that allow a model to understand word order and context. The work, published in npj Artificial Intelligence, provides a look into how complex cognitive-like abilities can emerge from the architecture of artificial intelligence.

Theory of mind is the ability to attribute mental states like beliefs, desires, and intentions to oneself and to others. It is what allows a person to understand that someone else might hold a false belief, for example, believing an object is in a box when it has been secretly moved to a drawer. This type of social reasoning is fundamental to human interaction.

In recent years, large language models have demonstrated an apparent ability to solve tasks designed to test this capacity, but the internal processes giving rise to this skill have remained largely opaque. Understanding these mechanics is a key goal for researchers working on making artificial intelligence more transparent and predictable.

This investigation was conducted by a team of researchers from Stanford University, Princeton University, the University of Minnesota, the University of Illinois Urbana-Champaign, and the Stevens Institute of Technology. Their work aimed to move beyond simply testing a model’s performance on social reasoning tasks.

Instead, they sought to identify the specific internal components responsible for this behavior, effectively looking under the hood to see how the machine performs its reasoning. The central questions were to pinpoint which of the billions of parameters in a model are most sensitive to theory-of-mind tasks and to determine how these parameters influence the model’s computational workflow.

To identify the parameters responsible for theory of mind, the researchers developed a novel method based on a mathematical tool that measures how much the model’s performance changes when a specific parameter is slightly altered. They first calculated this sensitivity for parameters while the model performed theory-of-mind tasks, specifically “false-belief” scenarios.

These tasks test if a model can recognize that an agent’s belief about the world is different from reality. For instance, a model would be presented with a story where a character places an item in one location, and then another character moves it without the first one’s knowledge. The model must correctly predict that the first character will look for the item in its original location.

This initial process identified a set of parameters sensitive to these social reasoning puzzles. However, the team recognized that some of these parameters might also be essential for general language processing. To isolate the ones specifically related to theory of mind, they performed a second sensitivity analysis on a general language modeling task and created a map of parameters vital for basic language functions. By subtracting this general language map from the theory-of-mind map, they were left with a very small, specialized set of parameters primarily dedicated to social reasoning.

With these “ToM-sensitive” parameters identified, the team conducted a perturbation experiment. They altered the values of this tiny group of parameters, which constituted as little as 0.001% of the model’s total. The effect on the model’s performance was significant.

Across several different language models, this small change caused a substantial drop in their ability to correctly answer theory-of-mind questions. As a control, the researchers also perturbed a randomly selected group of parameters of the same size. This random alteration had almost no effect on performance, indicating that the identified ToM-sensitive parameters have a specialized function.

The researchers discovered that this performance degradation was not just limited to social reasoning. The models also became worse at tasks requiring contextual localization, which is the ability to understand where a piece of information is located within a longer text. This suggested a link between the model’s ability to reason about mental states and its more fundamental ability to track the position of words and concepts in a sequence. The findings pointed toward the model’s positional encoding system, the architectural component that gives it a sense of word order.

The investigation then turned to how these sensitive parameters interact with the model’s core architecture. Many modern language models use a technique called Rotary Position Embedding, or RoPE, to understand word order. This method encodes the position of a word by applying a rotation to its numerical representation, with different dimensions of the representation rotating at different frequencies.

The analysis showed that the identified ToM-sensitive parameters were not random; they were precisely aligned with what are known as dominant frequency activations. These are the specific frequencies that the model relies on most heavily to process positional information.

When the ToM-sensitive parameters were perturbed, these dominant frequency patterns were disrupted. This effectively damaged the model’s internal map of the text, explaining why its ability for contextual localization diminished. The effect was specific to models that use the RoPE system.

In a model from a different family, which uses an alternative method for positional encoding, the same kind of sparse, sensitive parameter pattern was not found. This architectural contrast confirmed that the social reasoning ability in RoPE-based models is tightly coupled with this particular mechanism for handling word order.

The final piece of the puzzle was to trace how this disruption in positional encoding affects the model’s attention mechanism. The attention mechanism is what allows a model to weigh the importance of different words in a text when making a prediction. Many models exhibit a phenomenon known as an “attention sink,” where a significant amount of attention is consistently directed toward the very first token in a sequence. This first token acts as a stable anchor, helping the model organize its processing of the rest of the text.

The researchers found that the ToM-sensitive parameters play a role in maintaining the geometric relationship between the vector for the current word being processed and the vector for the first, anchor token. Perturbing these parameters altered the angle between these two vectors, making them more orthogonal, or perpendicular.

This change destabilized the attention sink. As a result, the model’s attention, no longer properly anchored, began to scatter to irrelevant parts of the text, such as punctuation. This breakdown in the model’s focus directly impaired its ability to form a coherent understanding of the language, leading to the observed failures in both social reasoning and general comprehension.

While this work provides a mechanistic explanation for theory-of-mind-like abilities in some models, the researchers note certain limitations. The analysis was primarily focused on specific types of false-belief tasks, and future work could explore whether similar parameter patterns govern more nuanced social skills like detecting irony or social faux pas. The findings also suggest that what appears to be a sophisticated cognitive skill may emerge from more fundamental mechanisms related to language structure and context.

The identification of such a localized set of parameters opens up new directions for research. It could lead to more efficient ways to align model behavior with human values or ethical norms. At the same time, it highlights potential vulnerabilities; if social reasoning is concentrated in such a small area, it could be a target for adversarial attacks designed to manipulate a model’s behavior. Understanding these structural underpinnings is a step toward developing artificial intelligence systems that are more transparent, reliable, and better aligned with human social cognition.

The study, “How large language models encode theory-of-mind: a study on sparse parameter patterns,” was authored by Yuheng Wu, Wentao Guo, Zirui Liu, Heng Ji, Zhaozhuo Xu & Denghui Zhang.

RELATED

AI outshines humans in humor: Study finds ChatGPT is as funny as The Onion
Artificial Intelligence

Most top US research universities now encourage generative AI use in the classroom

December 14, 2025
Media coverage of artificial intelligence split along political lines, study finds
Artificial Intelligence

Survey reveals rapid adoption of AI tools in mental health care despite safety concerns

December 13, 2025
Harrowing case report details a psychotic “resurrection” delusion fueled by a sycophantic AI
Artificial Intelligence

Harrowing case report details a psychotic “resurrection” delusion fueled by a sycophantic AI

December 13, 2025
Scientists just uncovered a major limitation in how AI models understand truth and belief
Artificial Intelligence

Scientists just uncovered a major limitation in how AI models understand truth and belief

December 11, 2025
Russian propaganda campaign used AI to scale output without sacrificing credibility, study finds
Artificial Intelligence

AI can change political opinions by flooding voters with real and fabricated facts

December 9, 2025
How common is anal sex? Scientific facts about prevalence, pain, pleasure, and more
Artificial Intelligence

Humans and AI both rate deliberate thinkers as smarter than intuitive ones

December 5, 2025
Song lyrics have become simpler, more negative, and more self-focused over time
Artificial Intelligence

An “AI” label fails to trigger negative bias in new pop music study

November 30, 2025
Daughters who feel more attractive report stronger, more protective bonds with their fathers
Artificial Intelligence

Learning via ChatGPT leads to shallower knowledge than using Google search, study finds

November 30, 2025

PsyPost Merch

STAY CONNECTED

LATEST

New study identifies five strategies women use to detect deception in dating

The mood-enhancing benefits of caffeine are strongest right after waking up

New psychology research flips the script on happiness and self-control

Disrupted sleep might stop the brain from flushing out toxic waste

Formal schooling boosts executive functions beyond natural maturation

A 120-year timeline of literature reveals distinctive patterns of “invisibility” for some groups

Recent LSD use linked to lower odds of alcohol use disorder

How common is rough sex? Research highlights a stark generational divide

RSS Psychology of Selling

  • Brain scans reveal increased neural effort when marketing messages miss the mark
  • Mental reconnection in the morning fuels workplace proactivity
  • The challenge of selling the connected home
  • Consumers prefer emotionally intelligent AI, but not for guilty pleasures
  • Active listening improves likability but does not enhance persuasion
         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy