Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

Mind captioning: This scientist just used AI to translate brain activity into text

by Eric W. Dolan
November 10, 2025
in Artificial Intelligence, Cognitive Science, Neuroimaging
Share on TwitterShare on Facebook

A new study published in Science Advances presents a method that converts human brain activity into coherent, descriptive text—even when the brain is not actively processing language. Instead of decoding words or sentences directly, the method interprets the nonverbal representations that occur before thoughts are put into words.

The study suggests that even when individuals are only watching or recalling silent video clips, their brain activity contains enough structured information to generate accurate descriptions of the scenes. Using functional MRI and advanced language models, Tomoyasu Horikawa, a distinguished researcher at NTT’s Communication Science Laboratories in Japan, was able to produce natural-language captions that closely matched both the objective content of the videos and the participants’ subjective recollections.

The motivation behind this work stemmed from a long-standing challenge in neuroscience: how to decode and interpret the rich, internal content of the human mind. While previous studies have shown some success in mapping brain activity to language, these efforts often rely on participants actively thinking in words, such as by speaking, reading, or listening. Such approaches limit the scope of decoding because not all mental experiences are verbal, and not all individuals have equal access to language, particularly those with conditions like aphasia.

Human thoughts often involve visual scenes, events, and abstract concepts that are not immediately translated into words. These mental representations can be detailed and structured, incorporating relationships between objects, actions, and environments. However, most decoding methods fall short of capturing this complexity, especially when relying on models that either imitate existing language structures or depend on hand-crafted databases of descriptions.

The researcher aimed to bridge this gap by developing a method that could interpret nonverbal mental representations—those formed during perception or memory—into coherent and meaningful text. The goal was not to read minds in the traditional sense, but to provide an interpretive interface that reflects what the brain is representing during an experience.

“I’ve long been fascinated by how the brain generates and represents content associated with our subjective conscious experiences, such as mental imagery and dreaming,” Horikawa told PsyPost. “I believe that brain decoding technology can help us investigate these questions while providing clear and intuitive interpretations of the information encoded in the brain.”

“Developing more sophisticated decoding methods could therefore advance our understanding of the neural bases of conscious experience — and, in the long run, help people whose difficulties might be relieved or overcome through direct information readout from the brain. The idea of mind captioning grew out of this effort — to better understand how such internal representations can be translated into language and shared meaningfully.”

Horikawa designed a decoding method called “mind captioning.” The approach involves two main steps: first, translating brain activity into semantic features using a deep language model; and second, generating natural language descriptions that align with those semantic features.

The study involved six adult participants, all native Japanese speakers with varying levels of English proficiency. They were shown thousands of short video clips depicting a wide range of visual content, including objects, actions, and social interactions. These videos were silent and shown without accompanying language. Functional MRI scans captured the participants’ brain activity during both the viewing of the videos and subsequent mental recall of the same clips.

The researcher trained a set of linear decoding models to map patterns of brain activity to semantic features extracted from captions written about each video. These semantic features were derived using a language model known as DeBERTa, which is designed to represent the meaning of text in a high-dimensional space.

After learning this mapping, the decoder was applied to new brain activity from both perception and recall conditions. The resulting semantic features were then used to generate text using another language model (RoBERTa) optimized for filling in missing words in a sentence. Through an iterative process of guessing, testing, and replacing words, the system gradually produced full sentences that reflected the brain’s decoded representations.

These generated sentences were evaluated in several ways. First, they were compared to human-written captions for accuracy and similarity using standard natural language evaluation metrics like BLEU, ROUGE, and BERTScore. The results showed that the machine-generated descriptions were highly discriminative: they could distinguish between different videos with strong reliability, even among 100 options.

The decoding method, when applied to participants’ brain activity, could identify the correct video with nearly 50% accuracy—a substantial improvement over the 1% expected by chance.

Notably, the method also generated quality descriptions from brain activity during the recall phase, though performance was not as high as for direct viewing. This indicates that the method could verbalize remembered experiences without requiring external stimuli. In some cases, the decoder performed well even on single instances of mental imagery.

“When I first tested the text generation algorithm after coming up with the approach, I was genuinely surprised to see how the original text corresponding to the extracted semantic features was progressively built up — step by step — into a coherent structure,” Horikawa said. “It felt as if I were hearing the faint voice of the brain seeping through the noise of the data, which made me confident that the approach could work.”

One of the key findings is that these descriptions included more than just lists of objects. They captured interactions and relationships, such as who did what to whom, or how different elements were arranged in space. When the word order of the generated sentences was shuffled, their similarity to the reference captions dropped sharply, showing that the original structure conveyed relational meaning, not just vocabulary.

“Another impressive finding came from the neuroscientific analysis shown in Figure 4E, where we examined how perception-trained decoders generalized to mental imagery using different types of feature representations (visual, visuo-semantic, and semantic),” Horikawa told PsyPost. “Although this trend was conceptually expected, we observed a remarkably clear gradient of generalizability across these levels, with semantic representations showing the strongest ability to bridge neural patterns between perception and recall.”

The study also found that descriptions could be generated without relying on activity in the brain’s traditional language areas. Even when these regions were excluded from analysis, the system still produced intelligible and structured descriptions. This suggests that meaningful semantic information is distributed across brain regions that process visual and contextual information, not just language.

“The study shows that it’s possible to generate coherent, meaningful text from brain activity — not by decoding language itself, but by interpreting the nonverbal representations that come before language,” Horikawa explained. “This may suggest that our thoughts are organized in a way that already carries structural information even before we put them into words, offering a new window into how the brain transforms experience into expression.”

“In the future, if we can learn to express ourselves more freely or interact with machines directly through our own brain activity, as in brain–machine interfaces, we may be able to unlock more of the brain’s potential.”

Although the study presents a promising approach, it comes with several limitations. The sample size was small, involving only six participants, all of whom underwent extensive training and scanning. However, each subject contributed many hours of data, which helped improve the decoding model’s reliability.

“Although our study included a relatively small number of participants, each contributed a substantial amount of data (about 17 hours of brain scanning), which allowed us to establish strong and reliable effects within individuals,” Horikawa said. “For example, the model achieved around 50% accuracy in a 100-alternative video identification task for each participant (see supplementary) — highly reliable performance given the difficulty of the problem (chance = 1%).”

“Importantly, these robust within-subject effects were consistently observed across all six participants, suggesting that the findings are practically significant despite the limited number of participants.”

Another limitation lies in the nature of the stimuli. The videos used in the study reflected common, real-world scenarios. It’s unclear whether the method would work as well for abstract concepts, atypical scenes, or highly personal mental content like dreams.

“As our method generates text from brain activity, it may be misinterpreted as a form of language decoding or reconstruction,” Horikawa noted. “However, this is not actually decoding of language information in the brain, but rather a linguistic interpretation of non-linguistic mental representations. Our method leverages the universal and versatile nature of natural language to provide intelligible interpretations of the information represented in the brain.”

There are also concerns about privacy. The idea of interpreting mental content raises ethical questions about autonomy and consent. While the current method requires large amounts of data from cooperative individuals, future advances may reduce this barrier.

“Some people may worry that this technology poses risks to mental privacy,” Horikawa told PsyPost. “In reality, the current approach cannot easily read a person’s private thoughts — it requires substantial data collection from highly cooperative participants, and its accuracy remains limited, with outputs affected by bias and noise. At present, the risks appear to be not high, though the ethical and social implications should continue to be discussed carefully as the technology develops.”

“What is important is not only to develop these technologies responsibly, but also to reflect on how we handle the information decoded from brain activity. We should avoid immediately treating the outputs as someone’s ‘true thoughts,’ and instead ensure that individuals retain autonomy in deciding whether and how to regard or present such outputs as their own intentions.”

Looking ahead, the approach could be extended to other types of mental content, such as auditory experiences, emotions, or internal narratives. It may also help in designing communication systems for individuals who cannot use speech or writing. By treating language as a bridge rather than the source, the method opens new possibilities for exploring how the brain generates and organizes meaning before it is expressed.

“My long-term goal is to understand the neural mechanisms underlying our subjective conscious experiences, and to help humans more fully realize the potential of the brain through scientific and technological advances,” Horikawa explained. “We plan to continue improving brain decoding approaches to access the information encoded in the brain more accurately and in greater detail, while ensuring that these technologies remain both scientifically valuable for understanding the brain and beneficial for people.”

The study, “Mind captioning: Evolving descriptive text of mental content from human brain activity,” was authored by Tomoyasu Horikawa.

RELATED

Paternal psychological strengths linked to lower maternal inflammation in married couples
Neuroimaging

Disrupted sleep might stop the brain from flushing out toxic waste

December 15, 2025
Does yoga and mindfulness training improve depression and anxiety among middle school students?
Cognitive Science

Formal schooling boosts executive functions beyond natural maturation

December 15, 2025
AI outshines humans in humor: Study finds ChatGPT is as funny as The Onion
Artificial Intelligence

Most top US research universities now encourage generative AI use in the classroom

December 14, 2025
Media coverage of artificial intelligence split along political lines, study finds
Artificial Intelligence

Survey reveals rapid adoption of AI tools in mental health care despite safety concerns

December 13, 2025
Scientists link common “forever chemical” to male-specific developmental abnormalities
Neuroimaging

New research maps how the brain processes different aspects of life satisfaction

December 13, 2025
Harrowing case report details a psychotic “resurrection” delusion fueled by a sycophantic AI
Artificial Intelligence

Harrowing case report details a psychotic “resurrection” delusion fueled by a sycophantic AI

December 13, 2025
Higher diet quality is associated with greater cognitive reserve in midlife
Cognitive Science

Higher diet quality is associated with greater cognitive reserve in midlife

December 12, 2025
Scientists just uncovered a major limitation in how AI models understand truth and belief
Cognitive Science

New review challenges the idea that highly intelligent people are hyper-empathic

December 11, 2025

PsyPost Merch

STAY CONNECTED

LATEST

New study identifies five strategies women use to detect deception in dating

The mood-enhancing benefits of caffeine are strongest right after waking up

New psychology research flips the script on happiness and self-control

Disrupted sleep might stop the brain from flushing out toxic waste

Formal schooling boosts executive functions beyond natural maturation

A 120-year timeline of literature reveals distinctive patterns of “invisibility” for some groups

Recent LSD use linked to lower odds of alcohol use disorder

How common is rough sex? Research highlights a stark generational divide

RSS Psychology of Selling

  • Mental reconnection in the morning fuels workplace proactivity
  • The challenge of selling the connected home
  • Consumers prefer emotionally intelligent AI, but not for guilty pleasures
  • Active listening improves likability but does not enhance persuasion
  • New study maps the psychology behind the post-holiday return surge
         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy