2018 Sat Poster 6766

Saturday, November 3, 2018 | Poster Session II, Metcalf Small | 3:15pm

Variation Sets in Maximally Diverse Languages
S. Moran, S. Stoll

Child-directed speech (CDS) has been shown to facilitate language learning through various structural features (e.g. Mintz, 2003, Fernald & Hurtado, 2006, Lew-Williams et al., 2011). Here we focus on so-called variation sets (e.g. Küntay & Slobin 1996, 2002, Waterfall 2006, Onnis et al., 2008) and test whether they are a cross-linguistically ubiquitous pattern. Variation sets are repetitions of instantiations of individual lexemes in close proximity. The definition of proximity and the types of lexical items studied vary across languages. The goal of the present study is to (i) operationalize variation sets and present several methods to extract them from CDS, (ii) test whether variation sets can be found in typologically radically different languages, and (iii) test whether they are a function of age as suggested by Waterfall et al. (2010) and Wiren et al. (2016).

We test these questions with a cross-linguistic database of longitudinal corpora from nine typologically maximally diverse languages with very different morphological systems: Chintang, Cree, Indonesian, Inuktitut, Japanese, Russian, Sesotho, Turkish and Yucatec (Stoll & Bickel, 2013, Moran et al., 2016). We extract variation sets from the input of caregivers to 40 target children between the ages of 1 and 6.

Because variation sets have been operationalized computationally in different ways (cf. Brodsky et al., 2007, Wirén et al., 2016, Grigonytė & Björkenstam, 2016), we compare and contrast the results from these different approaches and apply them to both the word and morpheme levels in the corpora. We apply two main approaches to nouns and verbs: an anchor technique which consist of pairwise evaluations of utterances for similar components in relation to an anchor point, e.g. utterance 1-2, 1-3, 1-4). We test these with utterance windows between 2 and 10 utterances. The other approach is an incremental, i.e. stepwise comparison of successive utterances, e.g. 1-2, 2-3, 3-4.

We find that variation sets in CDS with both methods are ubiquitous with nouns and verbs as the main exponents of variation sets as stated in Waterfall (2006). In contrast to other studies (Waterfall et al., 2010, Wiren et al., 2017) in our sample of languages, we do not find that the proportion of variation sets in CDS decreases as a function of age. For example, in Russian the decrease is clear; in Chintang, a polysynthetic language of Nepal, we find a general increase in the number of variation sets until the age of 5 (see Figures below). These results are the foundation for future research on the influence of a number of potential variables on the use of variation sets such as for instance the grammatical structure of a language or culturally or SES specific interaction patterns as found in Tal & Arnon (2017).

References

Aylin C. Küntay and Dan I. Slobin. 1996. Listening to a turkish mother: Some puzzles for acquisition. In Social Interaction, Social Context, and Language. Essays in the Honor of Susan Ervin-Tripp, pages 265–286. Lawrence Erlbaum, Mahwah, NJ.

Aylin C. Küntayy and Dan I. Slobin. 2002. Putting interaction back into child language: Examples from Turkish. Psychology of Language and Communication, 6:5–14.

Heidi R. Waterfall. 2006. A Little Change is a Good Thing: Feature Theory, Language Acquisition and Variation Sets. Ph.D. thesis, Department of Linguistics, University of Chicago.

Onnis, L., Waterfall, H. R., & Edelman, S. (2008). Learn locally, act globally: Learning language from variation set cues. Cognition, 109(3), 423–430.

Schwab, J. F., & Lew-Williams, C. (2016). Repetition across successive sentences facilitates young children’s word learning. Developmental psychology, 52(6), 879.

Lew-Williams, C., Pelucchi, B., & Saffran, J. R. (2011). Isolated words enhance statistical language learning in infancy. Developmental Science, 14(6), 1323-1329.

Fernald, A., & Hurtado, N. (2006). Names in frames: Infants interpret words in sentence frames faster than words in isolation. Developmental science, 9(3).

Heidi R. Waterfall, Ben Sandbank, Luca Onnis, and Shimon Edelman. 2010. An empirical generative framework for computational modeling of language acquisition. Journal of Child Language, 37:671– 703.

Peter Brodsky, Heidi R. Waterfall, and Shimon Edelman. 2007. Characterizing motherese: On the computational structure of child-directed language. In Proc. 29th Cognitive Science Society Conference, Nashville, TN.

Grigonytė, G., & Björkenstam, K. N. (2016). Language-independent exploration of repetition and variation in longitudinal child-directed speech: a tool and resources. In Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition (pp. 41-50).

Stoll, S. and Bickel, B. (2013). Capturing diversity in language acquisition research. Language Typology and Historical Contingency: In Honor of Johanna Nichols. Amsterdam: John Benjamins, pages 195–216.

Moran, S., Schikowski, R., Pajović, D., Hysi, C., and Stoll, S. (2016). The ACQDIV database: Min(d)ing the ambient language. In Nicoletta Calzolari (Conference Chair), et al., editors, Proceedings of the Tenth International Conference on Language Resources and Evalua- tion (LREC 2016), Paris, France. European Language Resources Association (ELRA).

Shira Tal & Inbal Arnon. 2017. SES Differences in the Structure of Child-directed Speech. Paper presented at BUCLD 2017. Boston, Mass.