American Sign Language Lexicon Video Dataset (ASLLVD)

In conjunction with NSF grant #0705749, "HCC: Large Lexicon Gesture Representation, Recognition, and Retrieval" (Stan Sclaroff, Carol Neidle, and Vassilis Athitsos -- with invaluable contributions from PhD students Ashwin Thangali and Joan Nash, among other students assisting with the project), video examples have been collected at Boston University, from up to 6 native ASL signers, for lexical items most of which are contained within the Gallaudet Dictionary of American Sign Language.

Video stimuli were presented to signers, who were asked to produce the sign they saw as they would naturally produce it. In cases where the signer reported that he or she does not normally use that sign, we did not elicit the sign from this signer. The video stimuli for elicitation were supplemented to include additional signs that were not in the dictionary. Videos were captured using four synchronized cameras, providing: a side view of the signer, a close-up of the head region, a half-speed high resolution front view, and a full resolution front view.

Linguistic annotations include unique gloss labels, start/end time codes for each sign, labels for start and end handshapes of both hands, morphological classifications of sign type (lexical, number, fingerspelled, loan, classifier, compound), and articulatory classifications (1- vs. 2-handed, same/different handshapes on the 2 hands, same/different handshapes for sign start and end on each hand, etc.). For compound signs, the dataset includes annotations as above for each morpheme. To facilitate computer vision based sign language recognition, the dataset also includes numeric ID labels for variants of a sign, video sequences in uncompressed-raw format, camera calibration sequences, and software for skin region extraction.

It is interesting to note that signers did not always produce the same sign that was shown in the prompt. In cases where a signer recognized and understood that sign but used a different sign or a different version of the same sign, divergences showed up in the data set. So, in reality, a given stimulus resulted in productions that may have varied in any of several different ways: production of a totally different but synonymous sign; production of a lexical variant of the same sign; production of essentially the same sign but differing in subtle ways with respect to the articulation (as a result of regular phonological processes).

We have developed framework for annotation that affords us the ability to carefully delineate the variants attested in this dataset. We annotate distinct signs with distinct gloss labels (consistent with the gloss labels in use for our other data sets, cf. http://secrets.rutgers.edu/dai/queryPages/ ). Linguistic annotation has been carried out using a beta version of SignStream® version 3 (a Java reimplementation of  SignStream version 2.2.2, which runs as a Macintosh Classic application) that provides capabilities for phonological annotation. Start and end frames were identified for each sign, and the handshapes used for each of the hands at the start and end of the sign were also annotated from this set: http://www.bu.edu/asllrp/cslgr/pages/handshape-palette.html.

As of February 2012, we are in the final stages of linguistic annotation and verification of the annotations, which is a time-intensive challenge facilitated by a tool developed for this purpose by Ashwin Thangali: the Lexicon Viewer and Verification Tool (LVVT). As soon as the verifications are complete, this annotated data set, complete with unique gloss labels and start and end handshaps for all signs, will be shared publicly.

Although the counts may change slightly as verifications occur, this is the current overview of the data that have been annotated to date:

stats

To make it clear how this chart should be read, a total of 2,484 monomorphemic lexical signs were collected. For some signs, there is more than one variant, resulting in a total number of distinct sign variants that is greater: 2,793. For 621 of those sign variants, we have examples from a single signer; for 989 of them, we have examples from 2 signers, etc., and for 141 of those sign variants, we have examples from all 6 of our native signers. Since we have more than one example from a given signer in some cases, the total number of tokens per sign may be greater than the total number of signers whose productions of that sign are included in our data set. In fact, for 175 of the signs, we have more than 6 tokens. (For 2 of the signs, we have as many as 19 tokens.)

The data that will be made available will include a list of the variants for each sign in the set, including the start and end handshapes -- on both the domminant and non-dominant hands -- for each production, with links to the movie files, which will also be available for download. This is illustrated here:

examples of variants

 

In this case, the sign for 'accident' has three lexical variants, which are distinguished by handshape but which have otherwise the same basic movement. These are considered to be lexcial variants and they have distinct glosses, in this case with the distinguishing handshape noted as part of the gloss label (although that is not necessarily the case for lexical variant glosses; general glossing conventions are documented in reports 11 and 13, with an update now in progress).

See illustration of start and end handshapes for these three variants.

In some cases, the alternation in handshape, e.g., between the A and S hand shapes shown for the end hand shapes of (5)ACCIDENT (see this chart for explanation of the handshape labels: http://www.bu.edu/asllrp/cslgr/pages/handshape-palette.html), is quite productive under appropriate phonological conditions and is not a property associated specifically with this lexical item.

The availability of such data for our 9,000+ tokens will provide extremely valuable material for study of the statistical distribution of handshapes, the types and frequencies of variations that occur, and the dependencies between the handshapes on the two hands and for start and end handshapes.

Further information about the release of this data set will be provided in the near future.

In the meantime, a preliminary set of video files collected at the outset of this project, identified only by the stimulus video that was used to obtain the sign (i.e., without unique gloss labels and without annotations of handshapes), is available from this page, where further information about the video file formats will also be found:

http://csr.bu.edu/asl/asllvd/annotate/index-cvpr4hb08-dataset.html

 

Reports related to this project

V. Athitsos, C. Neidle, S. Sclaroff, J. Nash, A. Stefan, Q. Yuan and A. Thangali (2008) "The ASL Lexicon Video Dataset", CVPR 2008 Workshop on Human Communicative Behaviour Analysis (CVPR4HB'08) (pdf ps)

H. Wang, A. Stefan, S. Moradi, V. Athitsos, C. Neidle, and F. Kamanga (2010) "A System for Large Vocabulary Sign Search," Proceedings of the Workshop on Sign, Gesture and Activity (SGA), September 2010. (pdf)

A. Thangali, J.P. Nash, S. Sclaroff and C. Neidle (2011) "Exploiting Phonological Constraints for Handshape Inference in ASL Video," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol., 2011. (pdf)
 

Related resources

Contacts

For queries related to ASL data collection and linguistic annotations:

carol AT bu.edu

For questions regarding data capture and video file formats:

athitsos AT uta.edu & sclaroff AT cs.bu.edu