2018 Sat Poster 6573

Saturday, November 3, 2018 | Poster Session II, Metcalf Small | 3:15pm

How different is speech input and output across subgroups? First results from >12,000 Hours of Naturalistic Recordings
E. Bergelson, A. Weisleder, J. Bunce, C. Rowland, M. Casillas, A. Cristia

Studies within primarily WEIRD populations (Western Educated Industrialized Rich Democracies; [HH10]) offer compelling evidence that language input varies between children, and that this variation predicts outcomes (e.g., [HM08]). Variability is partly attributable to differences in subgroups, i.e. groups that diverge in language, culture, and/or SES ([HR95]; cf. [SS18]). Collapsing across previous work [HR95;WF13;W81], one finds a 5–10-fold difference in reported input amounts. Such vast differences may be attributable to data collected with diverging methodologies, but even within a single study, a 3-fold difference across sub-groups can be found [HR95]. Here we compare input, output, and their correlation across subgroups using ecologically-valid and comparable data, in an effort to more closely equate methods and to consider greater between- and within-group variability.

Daylong audio-recordings (collected with a small recorder worn by the child) are likely to include a wider-range of activities and interactions than the brief sessions common in the literature. Families also have more time to ‘acclimate’ to being recorded, which may lead to more ecologically-valid data. Using a set of such recordings with standardized hardware and software (LENATM), we pooled 6 datasets sampling from different WEIRD subgroups (962 days of data; see Table), including some from HomeBank [VW16].

We used LENA’s talker-classification timestamps to derive total input as the sum of all vocalizations attributed to adult females, adult males, and other children (i.e., not the key child) relative to total recording time; we summed the key child’s productions separately as output (see Table and Figure). A mixed model predicting total input revealed little cross-dataset difference (see Table): only one dataset differed significantly from the others, and the maximum slope was .33 (i.e., showing only a 1.5-fold difference in input quantity). The same analysis for children’s output revealed striking similarity across these corpora, with only two datasets diverging significantly (and maximally consistent with a 1.5-fold difference), as well as an unexpected age effect, with lower output as age increased (ß=-.21 SE=.09, t=2.26). Finally, a model predicting children’s output from input quantities in interaction with dataset (controlling for age), confirms the main effect of input quantity, with no significant interactions with dataset.

Thus, differences across subgroups were relatively rare and are far smaller than previous ≥3-fold estimations. In addition, subgroup did not affect input to output correlations. These results suggest that when ecologically-valid methods are used across many participants, between-group variance is smaller than previously thought. That said, we acknowledge that data collection was not completely standardized and that LENA software was not designed to accommodate linguistic or cultural variation, does not distinguish between child-directed and overheard speech, and may be unevenly accurate (for input cf. [XY09;WF13;CL16]; for output quantities there’s little validation). Therefore, we are currently building open-source language-processing tools, alongside manual annotations, which may yield more accurate measures. Nevertheless, the present analyses represent an important next step in understanding young children’s language input and output at more representative breadth, depth, and scale.

References

B 16 Bergelson (2016). Bergelson Seedlings HomeBank Corpus. doi:10.21415/T5PK6D

CL15 Canault, M., Le Normand, M. T., Foudil, S., Loundon, N., & Thai-Van, H. (2016). Reliability of the Language ENvironment Analysis system (LENA™) in European French. Behavior Research Methods, 48(3), 1109–1124.

HH10 Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Most people are not WEIRD. Nature, 466(7302), 29.

HM08 Hurtado, N., Marchman, V. A., & Fernald, A. (2008). Does input influence uptake? Links between maternal talk, processing speed and vocabulary size in Spanish- learning children. Developmental Science, 11(6), F31–F39.

HR95 Hart, B., & Risley, T. R. (1995). Meaningful differences in the everyday experience of young American children. Paul H Brookes Publishing.

RD17 Rowland, C. F., Bidgood, A., Durrant, S., Peter, M. & Pine, J. M. (2017). The Language 0–5 Project Corpus. https://nyu.databrary.org/volume/389

SS18 Sperry, D. E., Sperry, L. L., & Miller, P. J. (2018). Reexamining the Verbal Environments of Children From Different Socioeconomic Backgrounds. Child Development, XX.

VJ16 VanDam, M., Jessup, C., & Tully, T. (2016, November). Fathers’ and mothers’ differential talk to sons and daughters with hearing loss. Poster presented at the 172nd Meeting of the Acoustical Society of America and the 5th Joint meeting of the Acoustical Society of America and the Acoustical Society of Japan, Honolulu, HI.

VW16 VanDam, M., Warlaumont, A. S., Bergelson, E., Cristia, A., Soderstrom, M., De Palma, P., & MacWhinney, B. (2016, May). HomeBank: An online repository of daylong child-centered audio recordings. In Seminars in Speech and Language (Vol. 37, No. 2, p. 128).

WM17 Weisleder, A. & Mendelsohn, A. (2017). Weisleder NY Corpus.

W81 Wells, C. G. (1981). Learning through interaction: The study of language development. Cambridge, UK: Cambridge University Press.

WF13 Weisleder, A., & Fernald, A. (2013). Talking to children matters: Early language experience strengthens processing and builds vocabulary. Psychological Science, 24(11), 2143–2152.

WP16 Warlaumont, A. S., Pretzer, G. M., Mendoza, S. & Walle, E. A. (2016). Warlaumont HomeBank Corpus. doi:10.21415/T54S3C

XY09 Xu, D., Yapanel, U., & Gray, S. (2009). Reliability of the LENA Language Environment Analysis System in young children’s natural home environment. LENA Foundation.