Exploiting phonological constraints for handshape recognition in sign language video: Ashwin Thangali, PhD Defense

2:00 pm on Wednesday, April 3, 2013
4:00 pm on Wednesday, April 3, 2013
MCS 148
Abstract: The ability to recognize handshapes in sign language video is essential in algorithms for sign recognition and retrieval. Handshape recognition from isolated images is, however, an insufficiently constrained problem. Many handshapes share similar 3D configurations and are indistinguishable for some hand orientations in 2D image projections. Additionally, significant differences in handshape appearance are induced by the articulated structure of the hand and variants produced by different signers. Linguistic rules involved in the production of signs impose strong constraints on the articulations of the hands, yet little attention has been paid to exploiting these constraints in previous works on sign recognition. The focus of this research is American Sign Language (ASL), although the same approach could be applied to other signed languages. Among the different classes of signs in ASL, so-called ``lexical signs'' constitute the prevalent class. Morphemes (i.e., meaningful units) for signs in this class involve a combination of particular handshapes, palm orientations, locations for articulation, and movement type. These are analyzed by many sign linguists as analogues of phonemes in spoken languages. As in spoken language, phonological constraints govern the ways in which phonemes combine in signed languages; utilizing these constraints for handshape recognition in ASL is the focus of this thesis. Handshapes in monomorphemic lexical signs are specified at the start and end of the sign. Handshape transitions within a sign are generally constrained to involve either closing or opening of the hand (i.e., folding or unfolding of the palm and one or more fingers). Akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints, also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, a large corpus is being prepared for dissemination: video for three thousand signs, each from up to six native signers of ASL, annotated with linguistic information such as glosses and morpho-phonological properties and variations, including the start/end handshapes associated with each ASL sign production. Thesis Committee Stan Sclaroff Margrit Betke Carol Neidle Erik Sudderth George Kollios (chair)