Large Lexicon Gesture Representation, Recognition, and Retrieval
This project involves research on computer-based recognition of ASL signs. One goal is development of a "look-up" capability for use as part of an interface with a multi-media sign language dictionary. Although printed dictionaries exist for ASL, they are generally organized according to the closest English translation of the ASL sign, since there is no written form for ASL. There are obvious problems resulting from the fact that there is no one-to-one correspondence between English words and ASL signs (imagine if you could only get information about French words--or words in any other spoken language--by looking them up under their English translations). This also poses an insurmountable difficulty for language learners, a kind of Catch-22: you can only look up a sign you don’t know in the dictionary if you already know what it means.
The proposed system will enable a signer either to select a video clip corresponding to an unknown sign, or to produce a sign in front of a camera, for look-up. The computer will then find the best match(es) from its inventory of thousands of ASL signs. Knowledge about linguistic constraints of sign production will be used to improve recognition. Fundamental theoretical challenges include the large scale of the learning task (thousands of different sign classes), the availability of very few training examples per class, and the need for efficient retrieval of gesture/motion patterns in a large database.
In addition to use with multi-media dictionaries, this technology will have many other applications, e.g., for computer-based automatic translation. Future aspirations include development of a “sloogle,” to do google-like searches through streams of ASL video.
Collaborative research by Stan Sclaroff and Carol Neidle (Boston University), and Vassilis Athitsos (University of Texas at Arlington), supported by funding from the National Science Foundation.
BU project participants also include: Jaimee DiMarco, Joan Nash, Alexandra Stefan, Jon Suen, Ashwin Thangali, and Quan Yuan.
A data collection session for the ASL Linguistic Video Dataset, with (from right to left) linguistic consultant Elizabeth Cassidy, doctoral students Joan Nash and Quan Yuan, and Research Engineer Gary Wong. [See more pictures.]
Photo by Devin Hahn, BU Productions.