In a groundbreaking study published in the Proceedings of the National Academy of Sciences, researchers led by Elika Bergelson from Harvard University employed machine learning to dissect the process of language development in infants and toddlers across the globe. Their findings challenge long-held beliefs about language development, highlighting the importance of adult speech in a child’s environment over factors like gender, multilingualism, and socioeconomic status.
Previous research on language development has often focused on individual predictors of language skills, such as the differences between boys and girls or the impact of a child’s socioeconomic background. However, these studies have typically relied on samples from Western societies and may not reflect the full spectrum of human language learning. Furthermore, they often did not consider the relative impact of these variables or the everyday language behavior of children.
This gap in the literature prompted Bergelson and her team to adopt a more comprehensive and global approach, leveraging advances in technology and machine learning to analyze over 40,000 hours of audio recordings from children across 12 countries and 43 languages. This broad sample was crucial for understanding language development in a variety of cultural and linguistic contexts, moving away from the Western-centric focus that has characterized much of the previous research in this field.
“My colleagues and I are all focused on early language development,” explained Bergelson, an associate professor of psychology. “In particular, many of us had been working with these long-form audio-recordings and realizing that to answer some of our burning questions, we’d need to pull together a big dataset to consider how different factors might link to early language experiences and production.”
“This was a pretty gargantuan effort but I think provides an interesting lens on early speech that complements work using more detailed manually-transcribed approaches; I think both will be powerful and crucial to the full story of how babies learn language.”
The researchers utilized a methodologically innovative approach that harnessed both cutting-edge technology and a vast international dataset. The core of their data collection involved the use of the LENA™ system, a small wearable device designed to capture the entire auditory environment of the child wearing it.
This technology allowed for day-long recordings of the natural interactions and vocalizations the children were exposed to and produced, providing a rich and ecologically valid dataset that went beyond the limitations of traditional lab-based studies or parent-reported measures. The scope of the research was unprecedented in its reach, incorporating audio recordings from 1,001 children aged two to 48 months.
Once the data were collected, the team employed machine learning algorithms to analyze the recordings. These algorithms were tasked with identifying and quantifying two main types of vocalizations: those from adults (adult talk) and those from children (child speech). The distinction was important for evaluating the amount of linguistic input from adults and the corresponding output from children in their natural settings.
The analysis revealed that the most significant predictors of language development are the child’s age, clinical factors, and the amount of speech they are exposed to from adults. Age emerged as a fundamental factor, aligning with the intuitive understanding that language skills expand as children grow older. Children who were born prematurely or who had conditions such as dyslexia showed different patterns of language development compared to their peers.
A standout finding from the study is the profound impact of adult speech on a child’s language development. The researchers discovered a direct correlation between the amount of adult vocalizations a child is exposed to and the quantity of vocalizations the child produces. Specifically, for every 100 adult vocalizations heard per hour, children produced 27 more vocalizations themselves. This effect was found to increase with the child’s age, highlighting the importance of interactive speech environments in fostering language skills.
“The main take-home message would be: in addition to age and (risk of) language-relevant clinical diagnoses, how much speech young children hear from adults influences how much they themselves produce,” Bergelson told PsyPost. “Other factors like child gender, maternal education level, and multilingual background, as measured them here in day-to-day interactions among kids and caregivers, did not contribute to how much speech 0-to-4 years olds produced.”
Contrary to many previous studies, Bergelson and her colleagues found no significant effects of gender or socioeconomic status (as measured by maternal education) on the speech production of children. This finding challenges the widely held belief that socioeconomic factors and the educational level of caregivers are key determinants of a child’s language capabilities.
Despite a common narrative that children from lower socioeconomic backgrounds receive less linguistic input, leading to poorer language outcomes, the study found no evidence supporting this claim.
“There’s been much debate and discussion in the literature in recent years about how socioeconomic status does or doesn’t link to language input and language output,” noted Bergelson. “We looked in many, many, many different ways … In no form did we ever find evidence that moms with more education had kids who produced more speech in these tens of thousands of hours of recordings from daily life.”
Similarly, the study’s results suggest that being raised in a multilingual environment or the child’s gender does not significantly alter the basic trajectory of language development, at least in terms of the quantity of speech produced by the child.
However, the study is not without limitations. The researchers note that their approach, while expansive, does not allow for detailed analysis of cultural or language-specific differences in language development. Additionally, the findings are based on correlational data, which means that causality cannot be inferred.
Future research directions include further exploration of the causal pathways between adult speech and child language production, as well as the development of machine learning tools that can differentiate between child-directed and overheard speech.
Nevertheless, the study represents a significant step forward in our understanding of language development, emphasizing the universal importance of adult speech over other demographic factors. It opens new avenues for research and potential interventions aimed at supporting language acquisition in young children, highlighting the role of adult interactions in fostering language skills across diverse cultural and linguistic contexts.
The study, “Everyday language input and production in 1,001 children from six continents,” was authored by Elika Bergelson, Melanie Soderstrom, Iris-Corinna Schwarz, Caroline F. Rowland, Nairán Ramírez-Esparza, Lisa R. Hamrick, Ellen Marklund, Marina Kalashnikova, Ava Guez, Marisa Casilla, Lucia Benetti, Petra van Alphen, and Alejandrina Cristia.