The data sets collected at Boston University as part of the
National Center for Sign Language and Gesture Resources
are now accessible through this site:
Video files are available in multiple formats (compressed, uncompressed), and annotations--carried out using SignStream®--are available in XML format.
Documentation about the ASLLRP annotation conventions is available from reports 11 and 13 linked to this page: http://www.bu.edu/asllrp/reports.html .
Updated documentation is currently in preparation.
Information about those involved in the data collection and annotation, and development of the Web interface.
Release of data - August 2007 -
including compressed video files showing multiple views of the signing and SignStream® annotation files
This project makes available several different types of experimental resources and analyzed data to facilitate linguistic and computational research on signed languages and the gestural components of spoken languages.
Two dedicated facilities for collection of video-based language data were established, one at Boston University and one at Rutgers University, each equipped with multiple synchronized digital cameras to capture different views of the subject.
A substantial corpus of American Sign Language (ASL) video data from native signers is being collected and made available. Data collection began in December 1999.
The video data are being made available in both uncompressed and compressed formats.
Significant portions of the collected data are also being linguistically annotated using SignStream®. The SignStream® databases are made publicly available, as will the SignStream application itself. (Although SignStream® is a MacOS Classic application, the data can be exported in text format, for use on other platforms. A new Java reimplementation is currently under development.)
The video data are also being analyzed by various computer algorithms. The SignStream® annotations of the data provide "ground truth" for evaluating such algorithms.
The collected data and the analysis results are being distributed over the Internet and on CD-ROM.
Thus, this project makes available sophisticated facilities for data collection, a standardization of protocol for such collection, and large amounts of language data.