|
CHILDES lab
This lab is due September To do this lab you will need access to the CHILDES data and the CLAN analysis program. There are two ways that you can use the analysis programs. One is to download the program (available to Mac, Windows, and some flavor of Unix) and relevant databases to your own computer, and the other is to use the web interface (WebCLAN). See picking a version of CLAN to use for comments on the options. In the text below, I will mark things that apply only to CLAN installed on your own machine with [C] and things that apply only to WebCLAN with [W]. On this page, you will find information on:
(The task you'll be doing in this lab assignment was originally formulated by Martha McGinnis, U. Calgary) Picking a version of CLAN to use You need to choose between a) downloading and installing CLAN on your computer, along with the corpus we'll be working with, and b) using WebCLAN online. The CLAN program (and WebCLAN) is really a collection of several different commands that you can execute to analyze your data. If you install CLAN on your computer, then you type these commands entirely within its command box. If you are using WebCLAN then you pick the command you want to run from a popup menu, and just type the arguments in the blank next to the popup menu, then press "Run". There are advantages to both. The availability of WebCLAN is a giant leap forward in terms of making labs like this for a course easy to manage, because there are always various types of headaches that go with trying to download and install things. A big advantage of WebCLAN is that all you need is an internet connection and a browser. For the purposes of doing this lab, it can do everything you need it to do. However, it does rely on your being connected to the internet (and also, it relies on CMU's web site being up). It also does not automatically save your output anywhere, so you have to save it yourself. On the other hand, installing CLAN yourself makes sense if you are going to be doing more than just this lab with it. For example, if you are going to do research with CHILDES corpora for your final project. This is primarily because the version of CLAN that you install can do slightly more. The biggest thing that it allows you to do is to provide your search criteria in an external file (rather than typing them in on the command line). This can be very helpful when trying to do complex searches. The ability of CLAN to recall previously used commands easily is also quite helpful (although this can be accomplished for WebCLAN in a limited way by using the back button in your browser). What I would probably recommend is: use WebCLAN for this lab, but once you have an idea of how it works, try to install CLAN on your own computer if you want to continue working with CHILDES. If you want to install CLAN on your computer, go to the CHILDES page, click on "CLAN programs", and choose the version for your computer. I will assume you basically know how to install things, although if you are trying to do this and having problems, I might be able to help in a limited way. Locating Nina's transcripts with WebCLAN: [W] If you are using WebCLAN, you find Nina's transcripts as follows.
Downloading Nina's transcripts (this is the data we will be working with): [C] If you opt to install CLAN, you will also need to download the data that we will be working with. Here is where you find it:
Starting CLAN on your own machine: [C] If you have installed CLAN on your own machine, start the program (double-click on the CLAN icon). To use it, you will need to tell it a couple of things about where to find certain files. This is accomplished by using the "Working", "Output", "Lib", and "Mor lib" buttons on the main CLAN window. When you press each of these buttons, you will be asked to find a folder on your computer. First, you should be able to leave "Lib" and "Mor lib" alone, but they should point to the folder in which the CLAN application resides. The "Working" folder is where CLAN looks for the input data, you want to point this to the "suppes" folder you downloaded earlier, which contains the Nina transcripts. The "Output" folder is where CLAN will store any output files you ask for. You probably want to point this to a new folder you create for this purpose. The structure of a (Web)CLAN command:
CLAN:
WebCLAN:
The lab assignment comes in six parts. I will mark things to hand in with a [H] symbol.
Part 1: Use CLAN to determine MLU. Use the mlu command to determine the MLU for Nina's transcripts. You can do this with the following CLAN command, discussed in the instructions above. If you use CLAN, the results will be in a file called "mlu-nina.txt" in the Output directory. If you use WebCLAN, it will be displayed in your browser, and you should save the file yourself as "mlu-nina.txt". mlu +t*CHI nina* > mlu-nina.txt
Part 2: Record Nina's age and MLU for each file 01-19. The file "mlu-nina.txt" should now contain the MLU for Nina's utterances in each transcript ("ratio of morphemes over utterances"). Open each transcript file from 01 through 19. (Note: there is no file nina08.cha.) Observe that at the top of each file, Nina's age in that file is recorded. [H] To hand in: A list containing, for each file from nina01.cha to nina19.cha, Nina's age in the transcript, and the MLU you computed for the transcript using the mlu command from part 1.
Part 3: Use CLAN to determine word frequencies For two representative samples, we will use CLAN to determine the frequency with which each word in the transcript appears. To do this, we use the freq command. It works very much like the mlu command described above. We will run freq on nina10.cha and nina19.cha, and you can use the following commands to do this. freq +t*CHI nina10.cha > freq-nina10.txt freq +t*CHI nina19.cha > freq-nina19.txt After having done this, you will have two lists of words and numbers (one from file 10, one from file 19). We will look at each, and pick a regular verb that occurs the most often from each file. I found that eat and see seemed to be equally popular verbs in the nina10.cha file. Somewhat arbitrarily, we'll look at eat (see is complicated by the fact that it often occurs as "See?", which properly lacks a subject). I discounted have because it can be an auxiliary (and auxiliaries behave differently), also an unnecessary complication for what we are trying to do. In the nina19.cha file, I picked get as the verb to look at. It's a common verb, not as popular as irregular go, but go is involved in some auxiliary uses like have. Want would be an reasonable verb to pick as well, but it isn't even as interesting to look at as get.
Part 4: Use CLAN to look at subject drop in a small sample of two files Part 4a. Search the transcripts for the examples. Having picked a common verb from each file, what we're going to do is look at each time the verb is used in the transcript and count how often it appears with a subject. To do this , use the following CLAN commands. combo +t*CHI -w2 +s"eat*" nina10.cha > selected-nina10.txt combo +t*CHI -w2 +s"get*" nina19.cha > selected-nina19.txt Make sure you know why it does what it does; read the combo notes at the end of this web page. This will give you two files (selected-nina10.txt and selected-nina-19.txt), which contain the child utterances containing the verbs you've picked and the two lines preceding each. Part 4b. Count up the totals. Now, go through each example and decide which of the following categories it falls under. Be sure to read the "exclusion" criteria carefully. You may find it helpful to print this out and do it with a pencil.
Part 4c. Describe what you found [H] Create a 2 x 3 table of results ( 2 rows and 3 columns) like the one below. Fill in the overt and null subject numbers for each file. In the third column, compute the percentage of included utterances for each file that have overt subjects (divide the number of overt subjects by the sum of both overt and missing subjects, and then multiply by 100).
[H] Write a sentence that describes the results (i.e. does the percentage of dropped subjects decrease as Nina gets older?).
Part 5: Use CLAN to study Nina's use of subject drop in wh-questions Search Nina's transcripts 01 through 19 for occurrences of the following wh-words: who, what, where, when, how, why, whose, which. You should create two output files, one for transcripts 01 through 09, and one for transcripts 10 through 19.
Go through your two output files in detail. For each output file, tally up and record how many utterances fall into each of the following four classes:
A note on how to count: You will find that there are a lot of utterances like What's that? or Where's another boy with a valentine? . Although one could potentially look at these a couple of different ways, I would count these in class [H] Create a 2 x 3 table of results (2 rows and 3 columns) like the one below. Let the first row represent Nina's early transcripts (01-09) and the second row represent her later transcripts (10-19). This works just like the table from before. Let the first column represent the number of utterances in class C for each set of transcripts, and the second column represent the number of utterances in class D for each set of transcripts.
For the third column of your table, calculate the percentage of these (non-subject wh-word) utterances that have an overt subject, by adding the class C and class D amounts for each set of transcripts together, then dividing the class D amount by the result and multiplying by 100 (that is, 100 * D / (C+D)). Put the resulting percentage of overt subjects for each set of transcripts in the third column. [H] Describe what you see (e.g., does the percentage of overt subjects increase as Nina gets older?).
Part 6: Discuss the comparison with Valian's (1991) results. Consider the tables below, from O'Grady (1997), based on data from Valian (1991). They show overall percentages of dropped subjects in general, not just in (non-subject) wh-questions. [H] Describe how your results on subject omission for eat and get (from part 4) compare with what Valian found. Mention things like whether you found more or less omission than Valian found, and pay particular attention to the groups of children whose age and/or MLU match the transcript you are looking at. [H] Describe how your results on subject omission in wh-questions (from part 5) compare to the overall rate of subject omission. Mention things like whether subjects are dropped more often or less often in wh-questions. [H] Consider your results in light of the hypothesis that "topic drop" accounts for some of the cases of subject omission in Child English (cf. comments about Bromberg & Wexler 1995 from the class handouts). Do your results support this hypothesis? Briefly explain why or why not.
O'Grady, William (1997). Syntactic Development. Chicago: University of Chicago Press. Valian, Virginia (1991). Syntactic subjects in the early speech of American and Italian children. Cognition 35:105-22. Comments on combo: CLAN includes a relatively powerful searching tool called combo. I will outline a couple of points here, although you should probably refer to the CLAN manual for more information. [W] NOTE: Although in CLAN, you should surround your search string with quotation marks (as shown below), in WebCLAN you should not put quotation marks around the search string. It's weird, but that's what I found. So, where it says +s"what^my" below in the CLAN command, you should just type +swhat^my if you are using WebCLAN. An example of the combo command is given below: combo +t*CHI +w2 -w2 +s"what^my" nina* > whatmy.txt This command says:
This will look for "what" immediately followed by "my" in any of the nina files, returning something like this: *** File "Moxie:CLAN:suppes:nina19.cha": line 254. *CHI: I want to play with you here . *CHI: look what my got . *CHI: look (1)what (1)my got . *MOT: I see what you got . *MOT: what did you get ? You can see that we used the "^" character in the search string. This character means "immediately followed by", so what we searched for was "what" immediately followed by "my". In these search strings there are several other special characters that you can use.
You can combine these in various ways to get useful effects. A couple of common things you might use are:
Some example combo commands are:
[C] Instead of typing in the thing you are searching for each time, you can also use a "search" file (but this function is not available in WebCLAN). The "search" file is a text file that contains the things you want to search for. An example search file might look like this (searching for first person pronouns).
If you save this file as "search-1pron.txt" in your Working directory, then you could do the search with the following combo command, where the @ tells combo to look in your file for the list of things to search for. combo +t*CHI +w2 -w2 +s@search-1pron.txt nina* > pron1-nina.txt [W] Because you cannot use a search file with WebCLAN, you have to enter everything as part of the parameters. In order to do the search described above, you would need to provide this as the argument for combo instead: +t*CHI +w2 -w2 +sI+I'*+me+me'*+my+my'* nina* |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||