Facial Analytics for Communication & Education

Profs. Dimitris Metaxas and Mariapaola D’Imperio (Rutgers University), Matt Huenerfauth (Rochester Institute of Technology), and Carol Neidle (Boston University) received an NSF grant through the Convergence Accelerator program in September 2020 for development of:

“Data & AI Methods for Modeling Facial Expressions in Language with Applications to
Privacy for the Deaf, American Sign Language (ASL) Education & Linguistic Research”

Partners from Gallaudet University (Ben Bahan and Patrick Boudreault) have recently joined the team, composed of linguists, computer scientists, deaf and hearing experts on ASL, and industry partners.

This project involves development of sustainable robust AI methods for facial analytics, potentially applicable across domains but targeted here to new applications that address important problems related to use of facial expressions and head gestures in natural language. In sign language, critical linguistic information of many kinds is conveyed exclusively by facial expressions and head gestures.

images of facial expressions with grammatical meanings

The fact that the face carries critical linguistic information poses major challenges for Deaf signers and for students of ASL as a non-native language.

Problem #1: The >500,000 US ASL signers have no way to communicate anonymously through videos in their native language, e.g., about sensitive topics (such as medical issues). This is perceived to be a significant problem by the Deaf community. It also means, for example, that signed submissions to scholarly journals cannot be reviewed anonymously.

Problem #2: 2nd-language learners of ASL (the 3rd most studied “foreign” language, with US college enrollments >107,000 as of 2016) have difficulty learning to produce these essential expressions, in part because they don’t see their own face when signing.

In spoken language, these expressions also play an important role, but they function differently.

Problem #3: 2nd-language learners of ASL (the 3rd most studied “foreign” language, with US college enrollments >107,000 as of 2016) have difficulty learning to produce these essential expressions, in part because they don’t see their own face when signing.

To address these problems, we are creating AI tools (1) to enable ASL signers to share videos anonymously by disguising their face without loss of linguistic information; (2) to help ASL learners produce these expressions correctly; and (3) to help speech scientists study co-speech gestures.

We have developed proof-of-concept prototypes for these three applications through NSF Phase I funding (grant #2040638).

The foundation for the applications just described is provided by development of new AI methods for continuous, multi-frame video analysis that ensures real-time, robust, and fair AI algorithm performance. The application design is also guided by user studies.

1) The Privacy Tool will allow signers to anonymize their own videos, replacing their face while retaining all the essential linguistic information.

2) The Educational Application will help learners produce nonmanual expression and assess progress by enabling them to record themselves signing along with target videos that incorporate grammatical markings. Feedback will be generated automatically based on the computational analysis of the students’ production in relation to the target.

3) The Research Toolkit includes: (a) a Web-based tool to provide computer-based 3D analyses of nonmanual expressions from videos uploaded by the user; and (b) extensions to SignStream® (our tool for semi-automated annotation of ASL videos, incorporating computer-generated analyses of nonmanual gestures) to accommodate speech, and to our Web platform for sharing files in the new format. These software and data resources will facilitate research in many areas, thereby advancing the field in ways that will have important societal and scientific impact.

Differentiators

(1) The proposed AI approach to analysis of facial expressions and head gestures—combining 3D modeling, Machine Learning, and linguistic knowledge derived from our annotated video corpora—overcomes limitations of prior research. It is distinctive in its ability to capture subtle facial expressions, even with significant head rotations, occlusions, and blurring. This will have other applications, too: e.g., for sanitizing other data involving video of human faces, medical applications, security, driving safety, and the arts.

(2) The applications themselves are distinctive: nothing like our proposed deliverables exists.

The NSF Convergence Accelerator issued its first set of awards in 2019. This new NSF program accelerates use-inspired, convergence research areas that are important nationally. The Convergence Accelerator seeks to create partnerships and bring together people from across disciplines and industry to address societal challenges and provide real solutions for the problems.

Facial Analytics for Communication & Education

Related Links