Subscribe
The latest psychology and neuroscience discoveries.
My Account
  • Mental Health
  • Social Psychology
  • Cognitive Science
  • Psychopharmacology
  • Neuroscience
  • About
No Result
View All Result
PsyPost
PsyPost
No Result
View All Result
Home Exclusive Artificial Intelligence

AI learns language through the experience of a single child in groundbreaking study

by Eric W. Dolan
February 1, 2024
in Artificial Intelligence, Cognitive Science
An 18-month-old baby wearing a head-mounted camera. (Photo by Wai Keen Vong)

An 18-month-old baby wearing a head-mounted camera. (Photo by Wai Keen Vong)

Share on TwitterShare on Facebook
Stay informed on the latest psychology and neuroscience research—follow PsyPost on LinkedIn for daily updates and insights.

In a groundbreaking study published in the journal Science, researchers have developed a machine learning model that mimics the way children learn language, offering new insights into early language acquisition. Using video and audio recordings from a young child’s perspective, the model successfully learned to associate words with visual objects, a feat that sheds light on the mysterious process of how children begin to understand and use language.

Understanding how children learn language has long been a fascinating subject for scientists and educators alike. At the heart of this is the phenomenon of connecting words to their meanings – a process seemingly simple yet incredibly complex. This study sought to demystify this process using the latest advancements in artificial intelligence.

The motivation behind this research lies in the need for a deeper understanding of early language acquisition. Traditionally, studies in this field have been conducted in controlled laboratory settings, which may not accurately reflect the natural environment in which children learn language.

Furthermore, there is a growing interest in developing artificial intelligence systems that can learn language in human-like ways. By uncovering the mechanisms behind how children link words to their visual counterparts, researchers hoped to not only enrich cognitive science but also guide the development of more advanced AI systems.

“I’ve been doing research on concept and language acquisition from the beginning of my research career, as I think there are a lot of interesting questions behind how humans and machines can learn and use concepts and language. Working with the dataset that was used in this paper (the SAYCam-S dataset) provided a unique opportunity to study these kinds of questions, and seeing if models could learn anything from naturalistic slices from a single child’s input,” explained study author Wai Keen Vong, a research scientist at the Center for Data Science at New York University.

The SAYCam-S dataset was gathered using a head-mounted camera worn by a single child, capturing video and audio recordings from the age of 6 to 25 months. The dataset included 600,000 video frames paired with 37,500 transcribed utterances, derived from 61 hours of video. This approach aimed to mirror the natural learning environment of a child, contrasting with the more controlled settings of traditional laboratory studies.

Vong and his colleagues created a machine learning model, named the Child’s View for Contrastive Learning model (CVCL), which was fed video frames representing what the child saw and linguistic utterances, representing what the child heard.

The CVCL model was designed to learn multimodal representations – a combination of visual and linguistic elements – and associate them with each other. The training of CVCL was self-supervised, meaning it did not rely on external labeling of data. Instead, the model learned by associating temporally co-occurring video frames and utterances as matching pairs, and treating non-co-occurring pairs as mismatches.

This contrastive learning approach aimed to mimic the way children learn language – by associating words they hear with objects and events they see in their environment. During training, the model randomly sampled video frames associated with each utterance and applied data augmentation to these images for robust learning.

https://www.psypost.org/wp-content/uploads/2024/01/Vong-adi1374-video-6.mp4

The model’s performance was evaluated against a range of everyday words and their corresponding visual referents in categorization tasks. It was also tested on its ability to generalize to novel visual exemplars not seen during training and to align visual and linguistic conceptual systems broadly.

“By using AI models to study the real language-learning problem faced by children, we can address classic debates about what ingredients children need to learn words—whether they need language-specific biases, innate knowledge, or just associative learning to get going,” explained co-author Brenden Lake, an assistant professor at New York University.

The model achieved a classification accuracy of 61.6% on a dataset of frames annotated with 22 visual concepts, showing its capability to match words with visual objects effectively. In comparison tests, CVCL performed close to a more extensively trained image-text contrastive neural network, CLIP, which was trained on much more data. The model demonstrated modest knowledge of additional visual concepts when tested on novel stimuli, with an accuracy of 34.7%. This is significant as it suggests the model’s ability to generalize beyond its training.

The findings from this study have broad implications for both cognitive science and the development of AI systems. The success of CVCL in mimicking a child’s language learning process challenges traditional theories that suggest more complex cognitive capacities are necessary for language acquisition. It demonstrates that simple associative learning mechanisms, coupled with multimodal representation learning, can be a solid foundation for understanding and replicating the early stages of word learning.

“Today’s state-of-the-art AI systems are trained using astronomical amounts of data (often billions/trillions of words), and yet humans manage to learn and use language with far less data (hundreds of millions of words), so the connection between these advances in machine learning to human language acquisition is not clear,” Vong explained to PsyPost. “To bridge that gap, in our work, we trained a multimodal neural network on 61 hours of visual and linguistic input from one child, and examined how much the model could learn, particularly in connecting words to their visual counterparts (e.g. connecting the word ‘ball’ to images of balls).”

“Surprisingly, the model acquired most (but not all) of the concepts present in its everyday experience, and could generalize this to visual instances of those words it hadn’t encountered either. These results suggest that the kinds of generic, associative learning mechanisms found in neural networks are sufficient for breaking into early word learning, without the need to posit additional constraints or inductive biases like other researchers have previously argued were necessary for language acquisition.”

However, the study is not without limitations. The data used was from a single child’s perspective, which may not represent the diversity of experiences across different children. Furthermore, the model’s ability to generalize to a broader range of linguistic and visual contexts remains to be tested. The CVCL model also does not account for the active, embodied nature of a child’s learning process and learns from static frames rather than temporally extended episodes.

“One caveat is that the language input to the model is text, not the underlying speech signal that children receive,” Vong said. “When learning from raw speech, children also need to learn how to segment the speech signal into individual words, which is not needed in our model. While this was a small limitation with the current study, we are confident that many of the aspects of our model could be left intact while incorporating the raw speech in future work.”

“We would also like to train our model on data from the additional children from the SAYCam dataset (there is video collected from two additional babies that we did not use in the current study), to see if our results are consistent and generalizable.”

Looking ahead, the research opens several avenues for future exploration. Incorporating more cognitively plausible assumptions into the model, such as the role of active learning, could bring the learning process in models closer to that in children. Additionally, extending the model to handle more complex aspects of language acquisition and testing it with data from multiple children are crucial steps forward.

The study, “Grounded language acquisition through the eyes and ears of a single child“, was authored by Wai Keen Vong, Wentao Wang, A. Emin Orhan, and Brenden M. Lake.

TweetSendScanShareSendPin4ShareShareShareShareShare

RELATED

Girls as young as 8 show cognitive sensitivity to their own body weight, new study finds
Body Image and Body Dysmorphia

Girls as young as 8 show cognitive sensitivity to their own body weight, new study finds

June 25, 2025

Girls as young as eight show a unique sensitivity to numbers representing their body weight, a new study finds. The results highlight early gender differences in attention and raise questions about how body awareness develops and affects girls’ perceptions later in life.

Read moreDetails
Schoolchildren in classrooms where trees can be seen are less prone to aggression, defiance, and rule-breaking
Cognitive Science

Critical thinking and academic achievement reinforce each other over time, study finds

June 24, 2025

A new study has found that critical thinking and academic achievement build on each other over time in elementary school students, highlighting the importance of integrating thinking skills into classroom learning to support long-term educational growth.

Read moreDetails
The fading affect bias impacts most memories — but election-related memories are surprisingly resilient
Memory

Scientists shed light on how forgiveness does and doesn’t reshape memories

June 20, 2025

A new study suggests that forgiving someone does not make us forget what they did—but it does change how we feel about it. People who forgave recalled past wrongs with just as much detail, but with less emotional pain.

Read moreDetails
Generative AI chatbots like ChatGPT can act as an “emotional sanctuary” for mental health
Artificial Intelligence

Do AI tools undermine our sense of creativity? New study says yes

June 19, 2025

A new study published in The Journal of Creative Behavior offers insight into how people think about their own creativity when working with artificial intelligence.

Read moreDetails
Tree-covered neighborhoods linked to lower ADHD risk in children
Cognitive Science

Scientists demonstrate superior cognitive benefits of outdoor vs indoor physical activity

June 18, 2025

A new study suggests that where kids exercise matters: children who played basketball outside showed sharper thinking and faster reaction times than when playing indoors, hinting at a powerful brain-boosting synergy between physical activity and nature.

Read moreDetails
Scientists uncover biological pathway that could revolutionize anxiety treatment
Cognitive Science

Different parts of the same neuron learn in different ways, study finds

June 16, 2025

Researchers have discovered that apical and basal dendrites of the same neuron use different strategies to learn, suggesting neurons adapt more flexibly than previously thought. The findings help explain how the brain fine-tunes its wiring during learning.

Read moreDetails
Dark personality traits and specific humor styles are linked to online trolling, study finds
Artificial Intelligence

Memes can serve as strong indicators of coming mass violence

June 15, 2025

A new study finds that surges in visual propaganda—like memes and doctored images—often precede political violence. By combining AI with expert analysis, researchers tracked manipulated content leading up to Russia’s invasion of Ukraine, revealing early warning signs of instability.

Read moreDetails
Teen depression tied to balance of adaptive and maladaptive emotional strategies, study finds
Artificial Intelligence

Sleep problems top list of predictors for teen mental illness, AI-powered study finds

June 15, 2025

A new study using data from over 11,000 adolescents found that sleep disturbances were the most powerful predictor of future mental health problems—more so than trauma or family history. AI models based on questionnaires outperformed those using brain scans.

Read moreDetails

SUBSCRIBE

Go Ad-Free! Click here to subscribe to PsyPost and support independent science journalism!

STAY CONNECTED

LATEST

Cognitive inflexibility amplifies risk of disordered exercise in men

Girls as young as 8 show cognitive sensitivity to their own body weight, new study finds

Economic data reveal the disturbing “echo of anxiety” after fatal school shootings

Maximization style and social media addiction linked to relationship obsessive compulsive disorder

Video games calm the body after stress, even when players feel on edge

Reading fiction fights loneliness and builds a healthier brain

Youth with psychopathic traits at increased risk of dying young, study finds

Critical thinking and academic achievement reinforce each other over time, study finds

         
       
  • Contact us
  • Privacy policy
  • Terms and Conditions
[Do not sell my information]

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Subscribe
  • My Account
  • Cognitive Science Research
  • Mental Health Research
  • Social Psychology Research
  • Drug Research
  • Relationship Research
  • About PsyPost
  • Contact
  • Privacy Policy